CN110991265B - Layout extraction method for train ticket image - Google Patents

Layout extraction method for train ticket image Download PDF

Info

Publication number
CN110991265B
CN110991265B CN201911103715.9A CN201911103715A CN110991265B CN 110991265 B CN110991265 B CN 110991265B CN 201911103715 A CN201911103715 A CN 201911103715A CN 110991265 B CN110991265 B CN 110991265B
Authority
CN
China
Prior art keywords
image
train ticket
character
ticket
contour
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911103715.9A
Other languages
Chinese (zh)
Other versions
CN110991265A (en
Inventor
王俊峰
唐鹏
高琳
陈懿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201911103715.9A priority Critical patent/CN110991265B/en
Publication of CN110991265A publication Critical patent/CN110991265A/en
Application granted granted Critical
Publication of CN110991265B publication Critical patent/CN110991265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Character Input (AREA)

Abstract

The invention discloses a layout extraction method of a train ticket image, which extracts a convex quadrangle with the largest area as an external contour of a train ticket by self-adaptively binarizing a video acquired picture and simplifying the contour. Calculating a projection transformation matrix according to the contour vertex, and standardizing the size and the gray value of the train ticket image; detecting character lines, and deleting undersized or oversized character lines in the character line set; according to the ordinate of the character line, clustering the character line by using DBSCAN; and finally, according to template rules, distributing attributes to the character lines after clustering and sequencing to realize layout analysis. The method improves the robustness of the face analysis of the railway ticket, reduces the workload of financial staff, supports the intelligent degree of information input of a financial ticket system, is favorable for getting through the boundaries of paper invoices and electronic taxes, and realizes the popularization of an intelligent invoice identification technology.

Description

Layout extraction method for train ticket image
Technical Field
The invention relates to the field of automatic processing of train ticket images, in particular to a layout extraction method of a train ticket image.
Background
Train tickets are vouchers for purchasing train travel services for outbound check-ups and subsequent financial reimbursements while riding in a train. Because of the wide breadth of our country, the train trip has both economy and high efficiency, and the importance degree is not enough. Along with the construction of high-speed railway lines such as Sichuan-Tibet railway and the like, the railway network in China is gradually improved, and the railway trip occupies more important position in the future; with the further popularization of China railways in the world, tickets are used as travel vouchers, and the contents of the tickets are urgently identified and analyzed automatically by an informatization method. The railway ticket invoice still remains an important content of financial reimbursement at present and for a long time in the future, and plays an important role in the national economic development and construction process. The train tickets occupy a very large proportion in the financial reimbursement work task, and the specialized printing specification and layout arrangement thereof urgently await automatic layout analysis and content identification based on intelligent image processing.
The form and content of the railway ticket are related to the modernization level of railway construction in China, the form and system of the railway ticket in China have different characteristics in different historical periods, and the railway ticket is changed from a hard-plate type railway ticket to a soft paper type railway ticket and then to a magnetic card type railway ticket. After the new China is established, the first generation railway ticket of China railway is a hard plate type railway ticket, the size of which is 57 multiplied by 25 mm, and braille is printed on the ticket surface. The train is divided into fast and slow trains, the ticket surface of the fast train is printed with a red line, and the ticket surface of the extra-fast train is printed with two red lines. The colors of the surface shading are respectively specified as follows: the soft seat ticket is light blue, the hard seat ticket is light red, the suburb ticket is light purple, the simple ticket is light green, the box ticket is orange yellow, and the like. In the 80 s of the 20 th century, Shenzhen railway station in China was the first to sell tickets by using computers, and the tickets were also changed into soft paper type railway tickets. In 1997, the Ministry of railroads determined a uniform pattern of computer tickets. The electronic ticket is not printed in advance, but printed on site by a hot transfer ticket machine adopting a non-impact printing technology during ticket selling. In 2007 and 7 months, the hard-board type train tickets which are used for more than 100 years gradually quit the historical stage and are completely replaced by nationwide networked electronic tickets. The station can sell soft paper type train ticket. In 2008, magnetic card type train tickets are sold successively at railway stations of large and medium-sized cities in China. The magnetic card type train ticket is a disposable ticket, the hardness of the ticket surface is higher than that of a soft paper type train ticket, the pattern of a motor train unit is printed on the front surface of the ticket, and the riding awareness of railway passengers is printed on the back surface of the ticket. The method is characterized in that a hot-roll ticket dispenser adopting a non-impact printing technology is used for printing on site during ticket selling, and magnetic information and thermosensitive information are implanted into the back of a ticket. From 2009, the national railway ticketing system is upgraded and updated, the one-dimensional code anti-counterfeiting mark below the ticket is changed into a two-dimensional code anti-counterfeiting mark, and the anti-counterfeiting function is more powerful. In addition to the common red soft paper type train ticket, the light blue magnetic card type train ticket is also upgraded. In 2011, after the train ticket is named, the ticket is added with information such as a two-dimensional code, the name of a ticket buyer, an identity card number and the like, wherein 4 digits of the identity card number are replaced by an asterisk to protect personal information. In 2011, Jingjin intercity tries to sell tickets on the Internet first, which marks that China continental railway ticket selling enters the Internet ticket selling era for the first time, and carries out online ticket selling and water testing for Jinghushi high-speed rail. In 2015, from 6 months, a new version of railway ticket is tried to be sold in a part of domestic cities, a railway 12306 website publishes a new ticket style, the ticket face is adjusted to be 'moved' out of an advertisement area, and the new ticket is declared to be tried from 6 months and 25 days in the current year, from 6 months and 25 days to 7 months and 31 days are new and old tickets and a transition period, and from 8 months and 1 day, the new ticket is completely used.
The appearance of train tickets has entered a stable period since 2015 to date. The study was conducted for trains at this time. From the content, the train ticket mainly comprises a passenger ticket and an additional ticket. The passenger ticket part is a soft seat and a hard seat. The additional ticket part is a speeder ticket, a sleeping berth ticket, a soft sleeping berth ticket and the like. In order to offer the best to children, students and disabled soldiers, the Chinese railway also sells half-value tickets. The train ticket surface contains various information including the type, time, seat number, etc. The Chinese train ticket is hard paper ticket, soft paper ticket, magnetic card ticket, electronic ticket, etc. The ticket face of the train ticket contains various information including information of riding interval, train number, starting point, seat number, seat grade, ticket price, station for sale and the like. The train number coding is specified according to the rules of the Ministry of railways, the directions of all roads to Beijing and branch roads or the specified direction are uplink directions, and the train number is coded into a double number; the direction of the whole road away from Beijing and the trunk line to the branch line or the designated direction is the descending direction, and the train numbers are compiled into odd numbers. In the form, the main red version and the blue version of the train ticket are different, the red version ticket is pink and is corresponding to a station window for ticket purchase; the blue ticket is blue and is corresponding to the ticket getting of the internet ticket buying station.
Under the social situation of economic mobility enhancement under the assistance of rail transit, the demand of financial automated identification application aiming at train tickets is urgent. According to investigation, at present, most enterprises and units still prefer high-speed rails in public lines, and a large number of railway tickets occupy most of traffic reimbursement contents, so reimbursement processing is urgently required. And at present stage to the mode of train ticket reimbursement management, still adopt traditional manual collection to type in the mode, and manual collection type needs invest a large amount of cost and time, has not only raised the operation cost, and inefficiency leads to invoice information in time effectively to transmit moreover, causes unnecessary fund to flow out, influences the performance of enterprises. After the train ticket scanning and identifying interface is applied, an enterprise can automatically acquire and input data of an invoice into an enterprise management system at the first time when the invoice is generated or received, so that the real-time effect is achieved, a large amount of time and cost are saved, and the method is an important choice for the society of the artificial intelligence era in the future.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a layout extraction method of a train ticket image, which is used for automatically positioning and extracting a train ticket area from an invoice image, positioning a character line frame of the ticket surface content and automatically matching the layout, and then forming a retrieval method of the train ticket image content for calling a subsequent character recognition function.
In order to solve the technical problems, the invention adopts the technical scheme that:
a layout extraction method of a train ticket image comprises the following steps:
step 1: positioning a train ticket area in the digital image, and intercepting the train ticket image to standardize the scale and the gray scale of the train ticket, specifically:
1.1) reading a photoelectric sampling digital image of the invoice uploaded by a scanner;
1.2) preprocessing the image, including denoising and smoothing filtering;
1.3) converting the color image into a gray image;
1.4), calculating a pixel gray mean value Mb in a rectangular range with the central length and width of the image being 100 pixels for the pixels in the width range of 50 pixels at the left and right boundaries of the image;
1.5) calculating a binary image;
1.6), if Mc obtained in the step 1.4) is less than Mb, performing inversion operation on the binary image obtained in the step 1.5) to turn over black and white;
1.7) for the black-and-white image, extracting a white area in the black-and-white image by using a continuum detection algorithm, and extracting a counterclockwise sequence set of boundary points of the white area as a contour of a white pattern spot;
1.8) simplifying each contour obtained in the step 1.7) by using a contour simplification algorithm;
1.9), traversing all simplified contours, deleting all non-quadrilateral contours, and deleting all concave polygon contours, namely, remaining contours which have 4 vertexes and are convex quadrilaterals;
1.10) selecting the contour with the largest area from the contour set with 4 vertexes as the contour of the train ticket according to the most obvious precondition hypothesis of the scanned object;
1.11) calculating a projection transformation matrix according to the quadrilateral vertexes of the contour, and performing projection transformation on the train ticket image to obtain a standardized train ticket image; wherein the standardized train ticket size is given by pre-acquired prior knowledge;
1.12) carrying out histogram equalization on the standardized train ticket image to realize gray level standardization processing;
step 2: performing self-adaptive layout analysis on the standardized train ticket image; the method specifically comprises the following steps:
2.1) carrying out character line detection on the standardized invoice image to obtain a circumscribed rectangle frame set of a plurality of character lines; the character line detection is realized by a pre-trained YOLO target detection model, and a model file is pre-loaded into a memory to realize rapid detection;
2.2) calculating the average value of the heights of the character rows, deleting the undersized or oversized character rows in the character row set by respectively taking the height average value of 0.5 time and the height average value of 1.5 times as threshold values, and reserving the character rows with proper character size;
2.3) clustering the residual character lines by using a DBSCAN algorithm according to the vertical coordinates of the residual character lines; wherein, the threshold parameter of the DBSCAN cluster is set as 1 time of height mean value;
2.4) sorting the character rows aggregated to the same class according to the ascending order of the abscissa;
2.5) acquiring a character line arrangement rule according to the train ticket template, and distributing attributes to the clustered and sequenced character lines according to the rule to realize the correspondence between layout items and the character lines, wherein the train ticket template rule is acquired in advance;
2.6), outputting the layout analysis information.
Further, the step 1.8) is specifically as follows: and sequentially trying to delete each point on the contour, if the influence on the circumference of the contour after deletion is smaller than a set threshold value, really deleting the point, otherwise, keeping the point and converting to process the next point, and repeating the steps until all the points on the contour are processed.
Further, the step 1.10) is specifically as follows: and scanning and converting each polygon, then counting the total number of pixels corresponding to each outline as an area, selecting the serial number of the largest area according to the area, and correspondingly reading the quadrilateral coordinate of the largest area.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention can adapt to the moderate difference of the shooting angle and the illumination.
2. The method not only can adapt to new train tickets after 2015, but also can adapt to old train tickets in 2011-2015, and has certain flexible processing capacity, so that the method can also adapt to subsequent fine reprinting of the train tickets automatically.
3. The processing of the invention can further refine the graphic range of the recognition processing before the flow of the recognition, reduce the algorithm load of the recognition and improve the efficiency macroscopically.
4. The method can be extended to other small invoice types.
5. The invention has no complex mechanical equipment, can effectively utilize the existing scanning equipment, and utilizes the algorithm module to expand the existing functions.
Drawings
Fig. 1 is a schematic diagram of a wire-frame template of a train ticket face and a layout thereof.
Fig. 2 is a schematic diagram of a layout analysis process of a train ticket.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
The invention aims to provide a layout analysis method for train tickets, which aims to solve the problems that a large number of train tickets are input in the current financial system, manual acquisition is relied on, and then efficiency is low and errors are frequent. The method can adapt to different appearances of red tickets and blue tickets and different micro-system differences in different periods in recent years, can automatically process under the conditions of complex scanning backgrounds and inclined scanning angles, utilizes a related method of intelligent digital image processing, lays a technical foundation for improving the information input efficiency of the train tickets, reduces the burden of the scanning maintenance work of the train tickets, ensures that the experience of participants is better than that of the traditional method, is easier to popularize and apply, and is more favorable for the normalization and the normalized popularization and application of the autonomous OCR recognition of the train tickets.
The invention utilizes invoice layout recognition software to form a layout extraction interface aiming at digital images of the train tickets acquired by a mobile phone, a high-speed shooting instrument and a numerical scanner. The layout analysis software runs in the background without a human-computer interaction interface. The software detects the scanned out catalog in the background, when a new unidentified image file appears, the image file is analyzed and processed, and the file is moved to the catalog marked as the analysis is completed. All processing can be automatically executed in the background of the system, and a key technical basis is provided for the automatic identification of the contents of the subsequent train tickets.
The basic idea of the invention is as follows: a railway ticket layout extraction method based on a computer vision technology is integrated in an intelligent invoice recognition server, takes a mobile phone photo, a high-speed shooting instrument image and scanning data of an invoice scanner as objects, and mainly comprises a railway ticket layout analysis algorithm module. The host computer reads the invoice digital image by driving the digital scanner, and then transmits the data to the train ticket layout analysis algorithm module, and the following detection steps are carried out:
the method comprises the following steps of firstly, positioning a train ticket area in a digital image, intercepting a train ticket image, and standardizing the scale and the gray scale of the train ticket image, wherein the method specifically comprises the following steps:
1) reading an invoice photoelectric sampling digital image uploaded by a scanner;
2) preprocessing the image, including denoising and smoothing filtering;
3) converting the color image into a gray image;
4) counting the average value Mb of the gray levels of the pixels in the range of 50 pixel widths at the left and right boundaries of the image, and calculating the average value Mc of the gray levels of the pixels in the rectangular range of which the length and the width of the center of the image are all 100 pixels;
5) calculating a binary image;
6) if Mc obtained in the step 4 is less than Mb, performing inversion operation on the binary image obtained in the step 5 to turn over the black and white;
7) for the black-and-white image, extracting a white area in the black-and-white image by using a continuum detection algorithm, and extracting a counterclockwise sequence set of boundary points of the white area as a contour of a white pattern spot;
8) and simplifying each contour obtained in the step 7 by using a contour simplification algorithm. The simplified process can be summarized as that each point on the contour is deleted in sequence, if the influence on the circumference of the contour after deletion is less than a set threshold value, the point is really deleted, otherwise, the point is kept and the next point is processed. The process is circulated until all points on the outline are processed;
9) traversing all simplified contours, deleting all non-quadrilateral contours, and deleting all concave polygon contours, namely, remaining contours which have 4 vertexes and are convex quadrilaterals; and judging whether the vector is a quadrangle or not by using the number of the vertexes, constructing two vectors by using two edges of each vertex, and calculating the sign of a cross product result of the two vectors. The cross-product sign of the edge vector for each vertex of the convex polygon should be the same, i.e., all negative or all positive. Thereby, contours that are obviously unlikely to be train tickets are excluded;
10) and selecting the contour with the largest area from the contour set with 4 vertexes as the contour of the train ticket according to the most obvious precondition hypothesis of the scanned object. The specific steps are that each polygon is scanned and converted, and then the total number of pixels corresponding to each outline is counted to be used as the area. Selecting the serial number of the largest area according to the area, and correspondingly reading the quadrilateral coordinate of the largest area;
11) calculating a projection transformation matrix according to the quadrilateral vertexes of the contour, and performing projection transformation on the train ticket image to obtain a standardized train ticket image; wherein the standardized train ticket size is given by pre-acquired prior knowledge;
12) and carrying out histogram equalization on the standardized train ticket image so as to realize gray level standardization processing.
And secondly, performing self-adaptive layout analysis on the standardized train ticket image, which specifically comprises the following steps:
1) carrying out character line detection on the standardized invoice image to obtain a circumscribed rectangle frame set of a plurality of character lines; the character line detection is realized by a pre-trained YOLO target detection model, and a model file is pre-loaded into a memory to realize rapid detection;
2) calculating the average value of the heights of the character rows, deleting the undersized or oversized character rows in the character row set by respectively taking the 0.5-time height average value and the 1.5-time height average value as threshold values, and reserving the character rows with proper character sizes; the deleted character lines are likely to be false detections;
3) and clustering the residual character lines by using a DBSCAN algorithm according to the vertical coordinates of the residual character lines. Wherein, the threshold parameter of the DBSCAN cluster is set as 1 time of height mean value; the character lines aggregated into one type are the same line in form, but are detected as a plurality of character lines due to a large space in the middle;
4) sorting the character rows aggregated to the same class according to the ascending order of the abscissa; the step is to arrange the characters in the same row in the order from left to right;
5) and acquiring a character line arrangement rule according to the train ticket template, and distributing attributes to the clustered and sequenced character lines according to the rule to realize the correspondence of layout items and the character lines. The train ticket template rule is acquired in advance, and can be briefly described as follows:
first row: the ticket number, i.e. serial number, of a train ticket
A second row: train ticket issuing station, train number and terminal station
Third row: chinese phonetic alphabet of initial station and terminal station
Fourth row: driving time and seat information
The fifth element: fare, seat class
A sixth row: time of day
The seventh row: identity card number and name
The last row is as follows: sales information coding
In addition, between the seventh line and the last line, there may be advertisement information and a two-dimensional code, but since the two-dimensional code does not belong to the core content to be OCR-recognized, no template consideration is made. Therefore, the rule for matching character lines according to the template is:
the first character row of the first row corresponds to the ticket number;
the first character line of the second line corresponds to the starting station, the second character line corresponds to the train number, and the third character line corresponds to the destination station;
the first character row of the third row corresponds to the Chinese pinyin of the initial station, and the second character row corresponds to the Chinese pinyin of the destination station;
the first character line of the fourth line corresponds to driving time, and the second character line corresponds to seat information;
the first character row of the fifth row corresponds to the fare and the second character row corresponds to the seat class;
the first character line of the sixth line corresponds to the effective time of the train number;
the first character line of the seventh line corresponds to the identity card number and the name;
the first character line of the last line corresponds to the selling information code;
if the number of the character lines in a certain line is insufficient, the character detection is omitted, all detection output cannot be realized, and the image detection needs to be prompted to be collected again in feedback information;
6) outputting layout analysis information;
7) and exiting.
Because the railway ticket layout analysis is directly processed in the high-resolution image, the processing result can be directly processed by a subsequent invoice identification module. The range of train ticket character recognition is simplified from full image search to designated area search, the calculation complexity is greatly reduced, and the train ticket recognition process is accelerated. Although a high quality layout analysis process may increase the number of operations properly, the recognition rate is still improved from a global perspective due to the improved hit rate.
Table 1 hardware is tabulated below:
name (R) Model number
Digital image high-speed shooting instrument 2000 ten thousand pixels A3/A4
Display screen 17inch liquid crystal
Invoice recognition service computer I7 16G GTX2080Ti
User prompting device FM buzzer
Description of hardware connection: the invoice image acquisition equipment is connected with the invoice identification computer module through a USB line. The computer is provided with a driving program and an application program of the scanner/high-speed shooting instrument. The contents of the train ticket are output as digital images after being assembled and converted through an optical signal AD in the scanner, and are automatically uploaded to an appointed directory of an invoice recognition computer through an equipment controller and a driving program after the scanning is finished, and the contents are stored as jpg format files according to time sequences. The train ticket recognition computer reads the data from the memory and calls the layout analysis module to process the data. The processing result is also stored in a server hard disk in a file form for a subsequent character recognition link.

Claims (3)

1. A layout extraction method of a train ticket image is characterized by comprising the following steps:
step 1: positioning a train ticket area in the digital image, and intercepting the train ticket image to standardize the scale and the gray scale of the train ticket, specifically:
1.1) reading a photoelectric sampling digital image of the invoice uploaded by a scanner;
1.2) preprocessing the image, including denoising and smoothing filtering;
1.3) converting the color image into a gray image;
1.4), calculating a pixel gray mean value Mb in a rectangular range with the central length and width of the image being 100 pixels for the pixels in the width range of 50 pixels at the left and right boundaries of the image;
1.5) calculating a binary image;
1.6), if Mc obtained in the step 1.4) is less than Mb, performing inversion operation on the binary image obtained in the step 1.5) to turn over black and white;
1.7) for the black-and-white image, extracting a white area in the black-and-white image by using a continuum detection algorithm, and extracting a counterclockwise sequence set of boundary points of the white area as a contour of a white pattern spot;
1.8) simplifying each contour obtained in the step 1.7) by using a contour simplification algorithm;
1.9), traversing all simplified contours, deleting all non-quadrilateral contours, and deleting all concave polygon contours, namely, remaining contours which have 4 vertexes and are convex quadrilaterals;
1.10) selecting the contour with the largest area from the contour set with 4 vertexes as the contour of the train ticket according to the most obvious precondition hypothesis of the scanned object;
1.11) calculating a projection transformation matrix according to the quadrilateral vertexes of the contour, and performing projection transformation on the train ticket image to obtain a standardized train ticket image; wherein the standardized train ticket size is given by pre-acquired prior knowledge;
1.12) carrying out histogram equalization on the standardized train ticket image to realize gray level standardization processing;
step 2: performing self-adaptive layout analysis on the standardized train ticket image; the method specifically comprises the following steps:
2.1) carrying out character line detection on the standardized invoice image to obtain a circumscribed rectangle frame set of a plurality of character lines; the character line detection is realized by a pre-trained YOLO target detection model, and a model file is pre-loaded into a memory to realize rapid detection;
2.2) calculating the average value of the heights of the character rows, deleting the undersized or oversized character rows in the character row set by respectively taking the height average value of 0.5 time and the height average value of 1.5 times as threshold values, and reserving the character rows with proper character size;
2.3) clustering the residual character lines by using a DBSCAN algorithm according to the vertical coordinates of the residual character lines; wherein, the threshold parameter of the DBSCAN cluster is set as 1 time of height mean value;
2.4) sorting the character rows aggregated to the same class according to the ascending order of the abscissa;
2.5) acquiring a character line arrangement rule according to the train ticket template, and distributing attributes to the clustered and sequenced character lines according to the rule to realize the correspondence between layout items and the character lines, wherein the train ticket template rule is acquired in advance;
2.6), outputting the layout analysis information.
2. The layout extraction method of a train ticket image as claimed in claim 1, wherein the step 1.8) is specifically as follows: and sequentially trying to delete each point on the contour, if the influence on the circumference of the contour after deletion is smaller than a set threshold value, really deleting the point, otherwise, keeping the point and converting to process the next point, and repeating the steps until all the points on the contour are processed.
3. The layout extraction method of a train ticket image as claimed in claim 1, wherein the step 1.10) is specifically as follows: and scanning and converting each polygon, then counting the total number of pixels corresponding to each outline as an area, selecting the serial number of the largest area according to the area, and correspondingly reading the quadrilateral coordinate of the largest area.
CN201911103715.9A 2019-11-13 2019-11-13 Layout extraction method for train ticket image Active CN110991265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911103715.9A CN110991265B (en) 2019-11-13 2019-11-13 Layout extraction method for train ticket image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911103715.9A CN110991265B (en) 2019-11-13 2019-11-13 Layout extraction method for train ticket image

Publications (2)

Publication Number Publication Date
CN110991265A CN110991265A (en) 2020-04-10
CN110991265B true CN110991265B (en) 2022-03-04

Family

ID=70084170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911103715.9A Active CN110991265B (en) 2019-11-13 2019-11-13 Layout extraction method for train ticket image

Country Status (1)

Country Link
CN (1) CN110991265B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103310211A (en) * 2013-04-26 2013-09-18 四川大学 Filling mark recognition method based on image processing
CN103839058A (en) * 2012-11-21 2014-06-04 方正国际软件(北京)有限公司 Information locating method for document image based on standard template
CN105005793A (en) * 2015-07-15 2015-10-28 广州敦和信息技术有限公司 Method and device for automatically identifying and recording invoice character strip
CN107025452A (en) * 2016-01-29 2017-08-08 富士通株式会社 Image-recognizing method and image recognition apparatus
CN109447067A (en) * 2018-10-24 2019-03-08 北方民族大学 A kind of bill angle detecting antidote and automatic ticket checking system
CN109657665A (en) * 2018-10-31 2019-04-19 广东工业大学 A kind of invoice batch automatic recognition system based on deep learning
CN109919883A (en) * 2018-12-03 2019-06-21 南京三宝科技股份有限公司 A kind of traffic video data capture method based on gradation conversion
CN110163285A (en) * 2019-05-23 2019-08-23 阳光保险集团股份有限公司 Ticket recognition training sample synthetic method and computer storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080310721A1 (en) * 2007-06-14 2008-12-18 John Jinhwan Yang Method And Apparatus For Recognizing Characters In A Document Image
US20150286860A1 (en) * 2014-04-02 2015-10-08 Le Moustache Club S.L. Method and Device for Generating Data from a Printed Document
JP6550723B2 (en) * 2014-10-31 2019-07-31 オムロン株式会社 Image processing apparatus, character recognition apparatus, image processing method, and program
CN104766372B (en) * 2015-04-29 2017-09-26 江苏保千里视像科技集团股份有限公司 A kind of stolen a ride with recognition of face decision-making system and its application method
GB2572386B (en) * 2018-03-28 2021-05-19 Canon Europa Nv An image processing system and an image processing method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839058A (en) * 2012-11-21 2014-06-04 方正国际软件(北京)有限公司 Information locating method for document image based on standard template
CN103310211A (en) * 2013-04-26 2013-09-18 四川大学 Filling mark recognition method based on image processing
CN105005793A (en) * 2015-07-15 2015-10-28 广州敦和信息技术有限公司 Method and device for automatically identifying and recording invoice character strip
CN107025452A (en) * 2016-01-29 2017-08-08 富士通株式会社 Image-recognizing method and image recognition apparatus
CN109447067A (en) * 2018-10-24 2019-03-08 北方民族大学 A kind of bill angle detecting antidote and automatic ticket checking system
CN109657665A (en) * 2018-10-31 2019-04-19 广东工业大学 A kind of invoice batch automatic recognition system based on deep learning
CN109919883A (en) * 2018-12-03 2019-06-21 南京三宝科技股份有限公司 A kind of traffic video data capture method based on gradation conversion
CN110163285A (en) * 2019-05-23 2019-08-23 阳光保险集团股份有限公司 Ticket recognition training sample synthetic method and computer storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Adaptive Solitary Pulmonary Nodule Segmentation for Digital Radiography Images Based on Random Walks and Sequential Filter;Dan Wang,等;《 IEEE Access》;20170214;第1460-1468页 *
基于OpenCV的火车票识别算法;薛圣利等;《广西科技大学学报》;20160630;第27卷(第02期);第46-51页 *
火车票面信息识别算法研究;孔祥倩;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20190115;第10-46页 *

Also Published As

Publication number Publication date
CN110991265A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN108717545B (en) Bill identification method and system based on mobile phone photographing
CN109740548B (en) Reimbursement bill image segmentation method and system
Huang et al. Icdar2019 competition on scanned receipt ocr and information extraction
CN109887153B (en) Finance and tax processing method and system
CN107067044B (en) Financial reimbursement complete ticket intelligent auditing system
CN107194400B (en) Financial reimbursement full ticket image recognition processing method
CN103034848B (en) A kind of recognition methods of form types
CN111476109A (en) Bill processing method, bill processing apparatus, and computer-readable storage medium
CN112651289B (en) Value-added tax common invoice intelligent recognition and verification system and method thereof
CN112395996A (en) Financial bill OCR recognition and image processing method, system and readable storage medium
CN103975342A (en) Systems and methods for mobile image capture and processing
CN102855495A (en) Method for implementing electronic edition standard answer, and application system thereof
CN108777021A (en) It is a kind of to mix the bank slip recognition method and system swept based on scanner
CN111242024A (en) Method and system for recognizing legends and characters in drawings based on machine learning
CN112508011A (en) OCR (optical character recognition) method and device based on neural network
CN110188714A (en) A kind of method, system and storage medium for realizing financial management under chat scenario
CN110647824B (en) Value-added tax invoice layout extraction method based on computer vision technology
CN109800747A (en) Medical invoice recognition methods, user equipment, storage medium and device
CN110751136A (en) Method for extracting value-added tax invoice information
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
CN109886257A (en) Using the method for deep learning correction invoice picture segmentation result in a kind of OCR system
CN111462388A (en) Bill inspection method and device, terminal equipment and storage medium
CN115082776A (en) Electric energy meter automatic detection system and method based on image recognition
CN101847221A (en) Intelligent identification method and system for bid evaluation scoring card
CN114299394A (en) Intelligent interpretation method for remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant