CN110619642B - Method for separating seal and background characters in bill image - Google Patents

Method for separating seal and background characters in bill image Download PDF

Info

Publication number
CN110619642B
CN110619642B CN201910835331.XA CN201910835331A CN110619642B CN 110619642 B CN110619642 B CN 110619642B CN 201910835331 A CN201910835331 A CN 201910835331A CN 110619642 B CN110619642 B CN 110619642B
Authority
CN
China
Prior art keywords
image
seal
background
bill
separating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910835331.XA
Other languages
Chinese (zh)
Other versions
CN110619642A (en
Inventor
王俊峰
高琳
唐鹏
李征
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201910835331.XA priority Critical patent/CN110619642B/en
Publication of CN110619642A publication Critical patent/CN110619642A/en
Application granted granted Critical
Publication of CN110619642B publication Critical patent/CN110619642B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a method for separating a seal from background characters in a note image, which comprises the following steps of firstly, collecting the note image containing the seal to establish a note seal data set; then, training a target detection model based on a convolutional neural network by using a labeling data set; secondly, detecting and positioning a seal image area by the trained model; then, carrying out color space transformation on the extracted stamp image; separating the seal and the background characters in the image by the blind source separation of the digital image; and finally, carrying out image segmentation on the separated stamp and the background character image to obtain a final result image. The method has better robustness to complex conditions such as uneven illumination, noise interference and the like, has better universal applicability, is suitable for the seal and background characters with any color or shape, can accurately separate the seal and the background characters, simultaneously retains the information in the seal and the background character information, and improves the accuracy and reliability of bill character recognition.

Description

Method for separating seal and background characters in bill image
Technical Field
The invention belongs to the field of computer digital image processing, and particularly relates to a method for separating a seal from background characters in a bill image.
Background
The bills are transaction vouchers of enterprises or individuals in commercial activities, and the number of the bills is increased sharply along with the rapid development of economy in China. The financial data informatization management system which is generally applied at present provides great convenience for inquiry and management of bill information, and a considerable part of the bill information is acquired from paper bills. Traditional collection mode is through the manual completion of typeeing of financial staff, because the information quantity is huge, need drop into a large amount of manual works, simultaneously because the reliability of manual work typeeing can't be ensured, still need spend a lot of manpowers and carry out the later stage proofreading. With the further improvement of financial information management capability, higher requirements are also placed on the accuracy and the input efficiency of bill information input. By utilizing the digital image recognition technology, the bill characters can be quickly and accurately positioned and extracted, the bill information is obtained through character recognition, and the input is automatically completed, so that the work efficiency of information input is greatly improved, and the error risk caused by manual operation is reduced while the input of manpower and material resources is reduced.
The bills are generally stamped with special stamps of tax or financial departments, and the stamping positions of some stamps are not fixed, so that important information on the bills can be covered or overlapped, which causes serious interference to subsequent character recognition. Therefore, in the bill image recognition process, it is usually necessary to restore the information covered by the stamp and then perform recognition. The traditional method for removing the seal is to separate the seal from the bill characters by separating a color channel on the assumption that the seal and the bill characters have different colors. However, there may be many colors of the stamp, and the stamp with the same color may have a large deviation from the standard color due to the difference of the ink and the like, and it is often difficult to accurately define and quantify the color of the stamp. In addition, the stamp itself also contains text information, which is also needed by financial staff, and only removing the stamp cannot meet the actual requirement, so that the stamp and the background text in the image need to be recovered at the same time.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for separating a seal from background characters in a bill image, which can accurately separate the seal from the background characters, improve the accuracy and reliability of bill character recognition and provide effective data for subsequent bill character recognition.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method for separating a seal from background characters in a bill image comprises the following steps:
step 1: denoising the acquired note image, marking the position and the size of the seal in the image, and establishing a note seal data set;
step 2: training a target detection model based on a convolutional neural network according to the labeled data set to obtain seal detection model parameters;
and step 3: detecting a note image to be separated by using a trained seal detection model, positioning the note image to a seal area in the note image, and extracting seal area data;
and 4, step 4: carrying out color space transformation on the extracted seal area to obtain a transformed image;
and 5: through the blind source separation of the digital image, the seal in the image after the transformation is separated from the background characters, which specifically comprises the following steps:
step 51: respectively removing the mean value of the seal areas of the three channels of hue, saturation and brightness, and subtracting the image mean value from the seal areas to enable the image pixel value mean value to be zero;
step 52: then, whitening processing is carried out on the image after the mean value is removed, and a whitened image is obtained;
step 53: separating the seal and the background characters from the whitened image by using an independent component extraction method;
step 6: and carrying out image segmentation on the separated seal and background characters, and removing the interference of background objects to obtain a final image.
Further, the step 1 specifically comprises:
step 11: carrying out Gaussian smoothing denoising processing on the bill image to obtain a denoised bill image sample;
step 12: marking the denoised bill image sample, marking the position coordinate of the seal, storing the marking information as a text, and establishing a data set together with the original bill image.
Further, the step 2 specifically comprises:
step 21: pre-training the convolutional neural network by using a public image data set to obtain initial parameters of a detection model;
step 22: and further training the detection model by using the established seal data set to obtain seal detection model parameters.
Further, the step 4 specifically includes:
step 41: converting the seal image from an RGB color space to an HSV color space;
step 42: and decomposing the color channel into three channels of hue, saturation and brightness.
Further, in step 6, the image is segmented using an OTSU adaptive threshold segmentation algorithm.
Compared with the prior art, the invention has the beneficial effects that: 1) the method has better robustness to common complex conditions such as uneven illumination, noise interference and the like; 2) the method has better universal applicability, and the seal and the background characters in any color or shape can be separated by applying the scheme of the invention; 3) by adopting a target detection model based on a convolutional neural network, all seal image areas in the bill image can be quickly and accurately positioned; 4) through blind source image separation, the seal and the background characters can be accurately separated, meanwhile, the information in the seal and the information of the background characters are kept, and the accuracy and reliability of bill character recognition are improved.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
FIG. 2 is a schematic diagram of a stamp image extracted by the method of the present invention.
FIG. 3 is a schematic diagram of the method of the present invention for decomposing an image into three channel images of H (for hue), S (for saturation), and V (for brightness).
FIG. 4 is a schematic diagram of the segmentation result of the stamp image separated by the method of the present invention.
FIG. 5 is a diagram illustrating the segmentation result of the background text image separated by the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the method for separating the stamp and the background text from the bill image comprises the following steps:
the method comprises the following steps: collecting a batch of bill images containing the seal, carrying out denoising pretreatment on the bill images, and filtering the bill images by using a Gaussian smoothing filter to remove Gaussian noise in the bill images; then, marking out external rectangles corresponding to all the seals in the note image, wherein the external rectangles comprise coordinates of the upper left corner of the rectangle, the width and the height of the rectangle; and storing the labeling result as a file in a text format, and forming a bill seal data set together with the original bill image. By removing the noise in the bill image, the interference caused by the noise in the subsequent processing is eliminated. The position of the seal region image is marked, so that a seal data set is conveniently established, and a seal detection model is trained.
Step two: and detecting the note image to be separated by using the trained seal detection model. Firstly, pre-training a network model by using a public data set ImageNet to obtain initial parameters of the model; and then converting the seal marking data into a format required by the training of the fast RCNN network model, and training the pre-trained network model. The initial parameters of the detection model are obtained through pre-training on the public data set, the model can have the capability of extracting the characteristics of a general target, the seal data set is used for training, the detection model can be rapidly transferred to the seal target on the seal data set with small sample amount, and the training efficiency is improved.
Step three: inputting the bill image into a trained detection model, selecting a rectangular region corresponding to a result with a confidence coefficient greater than 0.9 (the confidence coefficient value range is 0-1) as a seal image region according to a result output by the model, and extracting data of the regions as an independent seal image, as shown in fig. 2.
Step four: carrying out color space transformation on the extracted stamp image to obtain a corresponding transformed image; the concrete implementation steps are as follows: the original seal image is an RGB three-channel color image, and the seal image is converted into an HSV color space; the HSV color image is subjected to channel decomposition to obtain H, S, V grayscale images corresponding to the three channels, and each grayscale image is processed as an independent observation image (as shown in fig. 3) in the subsequent steps. After the color channels are decomposed, the image data of each channel can be analyzed independently. The image is decomposed into image channels with three different attributes of hue, saturation and brightness, and observation channel observation data are provided for subsequent blind source image separation. The adopted image blind source separation method requires that the number of observed images is not less than that of source images, and for stamp images, only one source is used, and the number of the observed images is increased through color space transformation and channel decomposition.
Step five: realizing blind source separation of the digital image through independent component analysis, and separating the seal in the seal image from background characters; the concrete implementation steps are as follows: respectively calculating the gray level average value of the image aiming at three images of the same seal, namely the gray level images corresponding to H, S, V channels, and then subtracting the image average value from each pixel in the image to obtain the average value of the pixel gray level value of the image, wherein the average value is zero; and carrying out whitening processing on the image subjected to the mean value removal to obtain a whitened image. The correlation among the image features is reduced through whitening processing, so that the image features have the same variance, and subsequent independent component extraction is facilitated; and separating the seal and the background characters from the whitened image by utilizing a FastICA independent component extraction algorithm.
The image is subjected to mean value removing processing and whitening processing, so that the subsequent independent component extraction process is simplified, and the convergence and stability of independent component extraction are improved.
Step six: and respectively carrying out image segmentation on the separated stamp image and the background character image, and segmenting the image by using an OTSU (over the Top) adaptive threshold segmentation algorithm to remove the interference of background objects, so as to obtain a final stamp image (shown in figure 4) and a final background character image (shown in figure 5).

Claims (5)

1. A method for separating a seal from background characters in a bill image is characterized by comprising the following steps:
step 1: denoising the acquired note image, marking the position and the size of the seal in the image, and establishing a note seal data set;
step 2: training a target detection model based on a convolutional neural network according to the labeled data set to obtain seal detection model parameters;
and step 3: detecting a note image to be separated by using a trained seal detection model, positioning the note image to a seal area in the note image, and extracting seal area data;
and 4, step 4: carrying out color space transformation on the extracted seal area to obtain a transformed image;
and 5: through the blind source separation of the digital image, the seal in the image after the transformation is separated from the background characters, which specifically comprises the following steps:
step 51: respectively removing the mean value of the seal areas of the three channels of hue, saturation and brightness, and subtracting the image mean value from the seal areas to enable the image pixel value mean value to be zero;
step 52: then, whitening processing is carried out on the image after the mean value is removed, and a whitened image is obtained;
step 53: separating the seal and the background characters from the whitened image by utilizing a FastICA independent component extraction method;
step 6: and carrying out image segmentation on the separated stamp image and the background character image, and removing the interference of background objects to obtain a final image.
2. The method for separating the stamp from the background text in the bill image according to claim 1, wherein the step 1 is specifically as follows:
step 11: carrying out Gaussian smoothing denoising processing on the bill image to obtain a denoised bill image sample;
step 12: marking the denoised bill image sample, marking the position coordinate of the seal, storing the marking information as a text, and establishing a data set together with the original bill image.
3. The method for separating the stamp from the background text in the bill image according to claim 1, wherein the step 2 is specifically as follows:
step 21: pre-training the convolutional neural network by using a public image data set to obtain initial parameters of a detection model;
step 22: and further training the detection model by using the established seal data set to obtain seal detection model parameters.
4. The method for separating the stamp from the background text in the bill image according to claim 1, wherein the step 4 is specifically as follows:
step 41: converting the seal image from an RGB color space to an HSV color space;
step 42: and decomposing the color channel into three channels of hue, saturation and brightness.
5. The method according to claim 1, wherein in step 6, the image is segmented using OTSU adaptive threshold segmentation algorithm.
CN201910835331.XA 2019-09-05 2019-09-05 Method for separating seal and background characters in bill image Active CN110619642B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910835331.XA CN110619642B (en) 2019-09-05 2019-09-05 Method for separating seal and background characters in bill image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910835331.XA CN110619642B (en) 2019-09-05 2019-09-05 Method for separating seal and background characters in bill image

Publications (2)

Publication Number Publication Date
CN110619642A CN110619642A (en) 2019-12-27
CN110619642B true CN110619642B (en) 2022-02-01

Family

ID=68922592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910835331.XA Active CN110619642B (en) 2019-09-05 2019-09-05 Method for separating seal and background characters in bill image

Country Status (1)

Country Link
CN (1) CN110619642B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949333B (en) * 2019-03-20 2021-06-08 北京小蜻蜓智能科技有限公司 Character and seal separation method based on color unmixing
CN111368840A (en) * 2020-02-20 2020-07-03 中国建设银行股份有限公司 Certificate picture processing method and device
CN111401352B (en) * 2020-03-13 2023-10-20 深圳前海环融联易信息科技服务有限公司 Text picture underline identification method, text picture underline identification device, computer equipment and storage medium
CN111753785A (en) * 2020-07-01 2020-10-09 浪潮云信息技术股份公司 Seal detection method based on deep learning technology
TWI745068B (en) * 2020-09-02 2021-11-01 中國信託商業銀行股份有限公司 Method for establishing seal identification model and server terminal for establishing seal identification model
CN112651913B (en) * 2020-12-17 2024-03-29 广州市申迪计算机系统有限公司 Invoice seal desalination method, system, device and computer storage medium
CN113255657B (en) * 2020-12-31 2024-04-05 深圳怡化电脑股份有限公司 Method and device for detecting scratch on bill surface, electronic equipment and machine-readable medium
CN113065407B (en) * 2021-03-09 2022-07-12 国网河北省电力有限公司 Financial bill seal erasing method based on attention mechanism and generation countermeasure network
CN112766275B (en) * 2021-04-08 2021-09-10 金蝶软件(中国)有限公司 Seal character recognition method and device, computer equipment and storage medium
CN113449706A (en) * 2021-08-31 2021-09-28 四川野马科技有限公司 Bill document identification and archiving method and system based on artificial intelligence
CN114936965B (en) * 2022-06-07 2023-06-02 上海弘玑信息技术有限公司 Seal removing method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268823A (en) * 2014-09-18 2015-01-07 河海大学 Digital watermark algorithm based on image content
CN109284758A (en) * 2018-09-29 2019-01-29 武汉工程大学 A kind of invoice seal removing method, device and computer storage medium
CN109636825A (en) * 2018-11-01 2019-04-16 平安科技(深圳)有限公司 Seal graphics dividing method, device and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268823A (en) * 2014-09-18 2015-01-07 河海大学 Digital watermark algorithm based on image content
CN109284758A (en) * 2018-09-29 2019-01-29 武汉工程大学 A kind of invoice seal removing method, device and computer storage medium
CN109636825A (en) * 2018-11-01 2019-04-16 平安科技(深圳)有限公司 Seal graphics dividing method, device and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Study of Image Processing for Automatic Counting Gas Cylinder Seal;Melani,R 等;《PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON APPLIED ENGINEERING》;20181231;正文第1-3节 *
基于独立分量分析的数字水印研究;马雪飞;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20061215;正文第2-4章 *

Also Published As

Publication number Publication date
CN110619642A (en) 2019-12-27

Similar Documents

Publication Publication Date Title
CN110619642B (en) Method for separating seal and background characters in bill image
CN109886974B (en) Seal removing method
CN107944452B (en) Character recognition method for circular seal
CN109284758B (en) Invoice seal eliminating method and device and computer storage medium
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
Shen et al. Improving OCR performance with background image elimination
CN103824091B (en) A kind of licence plate recognition method for intelligent transportation system
CN105913093A (en) Template matching method for character recognizing and processing
CN111461122B (en) Certificate information detection and extraction method
CN108146093B (en) Method for removing bill seal
CN104361336A (en) Character recognition method for underwater video images
CN101599125A (en) The binarization method that the complex background hypograph is handled
CN112750106B (en) Nuclear staining cell counting method based on incomplete marker deep learning, computer equipment and storage medium
CN110309806B (en) Gesture recognition system and method based on video image processing
CN106980857B (en) Chinese calligraphy segmentation and recognition method based on copybook
CN110991439B (en) Handwriting character extraction method based on pixel-level multi-feature joint classification
CN104766344B (en) Vehicle checking method based on movement edge extractor
CN113673541B (en) Image sample generation method for target detection and application
Revathi et al. Comparative analysis of text extraction from color images using tesseract and opencv
CN108877030B (en) Image processing method, device, terminal and computer readable storage medium
WO2009067022A1 (en) A method for resolving contradicting output data from an optical character recognition (ocr) system, wherein the output data comprises more than one recognition alternative for an image of a character
CN112070684B (en) Method for repairing characters of a bone inscription based on morphological prior features
CN110046618B (en) License plate recognition method based on machine learning and maximum extremum stable region
CN109460768B (en) Text detection and removal method for histopathology microscopic image
Ouji et al. Chromatic/achromatic separation in noisy document images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant