CN113158755A

CN113158755A - Method for improving accuracy of bank pipelining recognition

Info

Publication number: CN113158755A
Application number: CN202110174145.3A
Authority: CN
Inventors: 李潇; 董伯文
Original assignee: Shanghai Fuli Technology Co Ltd
Current assignee: Shanghai Fuli Technology Co Ltd
Priority date: 2021-02-07
Filing date: 2021-02-07
Publication date: 2021-07-23

Abstract

The invention relates to the technical field of financial wind control, in particular to a method for improving the accuracy of bank pipelining recognition. A method for improving the accuracy of bank pipelining recognition is characterized in that: the specific process is as follows: s1: scanning the paper bank assembly line into an electronic file and inputting the electronic file into a computer; s2: rotating the scanned picture to enable the picture to be basically horizontal according to the content; s3: after the picture is basically horizontal, acquiring the abscissa and the ordinate of a two-dimensional table of contents of the bank flow picture; s4: according to the horizontal coordinates and the vertical coordinates, the pictures are divided, and each data item is guaranteed to correspond to one small picture; s5: and identifying the picture contents one by one according to the divided small pictures, organizing the identified contents into form data in a text form, and finishing data identification. The method provides a picture processing method aiming at the preprocessing process of electronic pictures scanned by paper bank running water to improve the accuracy of bank running water identification.

Description

Method for improving accuracy of bank pipelining recognition

Technical Field

The invention relates to the technical field of financial wind control, in particular to a method for improving the accuracy of bank pipelining recognition.

Background

In the field of financial wind control, bank running analysis and results are a very important wind control strategy and index. The running analysis of the bank involves a lot of calculations, so that the running analysis by means of a computer system can greatly improve the analysis efficiency of the running of the bank. In the current bank system, a paper bank flow is provided in many cases. If the running analysis is required by a computer, the scanning and the identification of the running paper bank are necessary. And the character recognition rate of the whole streamline image recognized by the common bank streamline recognition method is not high. The invention provides a bank flow identification method which is used for improving the accuracy of bank flow identification.

Disclosure of Invention

The invention provides a method for improving the accuracy of bank flow identification, aiming at overcoming the defects of the prior art, and providing a picture processing method for the early-stage processing process of electronic pictures scanned in the paper bank flow to improve the accuracy of the bank flow identification.

In order to achieve the purpose, the method for improving the accuracy of bank pipelining recognition is designed, and is characterized in that: the specific process is as follows:

s1: scanning the paper bank assembly line into an electronic file and inputting the electronic file into a computer;

s2: rotating the scanned picture to enable the picture to be basically horizontal according to the content;

s3: after the picture is basically horizontal, acquiring the abscissa and the ordinate of a two-dimensional table of contents of the bank flow picture;

s4: according to the horizontal coordinates and the vertical coordinates, the pictures are divided, and each data item is guaranteed to correspond to one small picture;

s5: and identifying the picture contents one by one according to the divided small pictures, organizing the identified contents into form data in a text form, and finishing data identification.

The specific process of S2 is as follows:

s21: judging whether the scanned pictures are two-dimensional table bank flow, if so, performing steps S22 to S26, otherwise, performing steps S27 to S211;

s22: if the scanned picture is a two-dimensional table bank pipeline, a plurality of short line segments exist in the picture, line segments with y coordinates which are closer to the minimum y coordinate in all the line segments are searched from the line segments, and the line segments are just the first line segments forming the two-dimensional table;

s23: in the first line segment, the coordinates of two points of the leftmost line segment are searched and recorded as (x1, y1) and (x2, y2), and the coordinates of the rightmost line segment are recorded as (x3, y3) and (x4, y 4);

s24: taking (x1, y1) and (x4, y4) as the start point coordinate and the end point coordinate of the two-dimensional table of the data of the bank water picture, and the line is taken as the reference line of the rotating picture;

s25: the slope value is calculated from this line segment coordinate: rate = (y4-y1)/(x4-x1), if the absolute value of the slope is greater than 0.005, then the picture needs to be rotated;

s26: rotating by using the central point of the picture as an origin and an angle obtained by calculating a slope rate;

s27: if the scanned picture is not a two-dimensional table bank pipelining, no line segment of the two-dimensional table exists in the picture, so that the rotation adjustment needs to be carried out by arranging the data item content as a reference;

s28: carrying out binarization, expansion, corrosion and negation operations on the picture to enable the part of the picture with data content to be highlighted;

s29: finding out rectangles with smaller areas, and positioning data items in the pictures according to the coordinates of the rectangles;

s210: the rectangle of the data item table can be positioned from all the rectangles according to the approximate position information of the picture where the data item is located;

s211: the coordinates of the position rectangle of the first row are selected, and the picture is rotated until the picture data item is kept substantially horizontal according to the method of steps S22 to S26.

The specific process of S3 is as follows:

s31: if the scanned picture is a two-dimensional table bank flow, acquiring coordinates of all vertical lines in the picture, storing vertical coordinates of all points in the vertical lines into a list and sequencing;

s32: gathering the numbers with the relatively close sizes together to form a number list, wherein all the number lists form a number list;

s33: averaging the numbers in a single number list to form a list;

s34: if the scanned picture is not a two-dimensional table bank flow, acquiring rectangular coordinates of most data item contents in the picture;

s35: composing most of the data into a two-dimensional table;

s36: the position of the data item content in the picture excludes the non-transaction data item, and rectangular coordinates of the key data item content are obtained.

In step S4, the picture is divided, a two-dimensional grid is drawn in the picture according to the abscissa and ordinate obtained in step S3, and the picture is cut according to the abscissa and ordinate in the two-dimensional table to obtain a plurality of corresponding data pictures, each picture corresponding to a single data item.

Compared with the prior art, the method for improving the accuracy of bank flow identification is provided, and the method provides a picture processing method for the early-stage processing process of electronic pictures scanned by paper bank flow to improve the accuracy of bank flow identification.

Drawings

FIG. 1 is a software flow diagram of the present invention.

FIG. 2 is a schematic representation of a two-dimensional table bank pipeline.

FIG. 3 is a schematic view of a bank pipeline for a non-two-dimensional table.

FIG. 4 is a schematic representation of a bank pipeline prior to processing by the present invention.

FIG. 5 is a schematic representation of a bank pipeline after processing by the present invention.

Detailed Description

The invention is further illustrated below with reference to the accompanying drawings.

As shown in fig. 1, a method for improving accuracy of bank pipelining recognition specifically includes the following steps:

The specific flow of S2 is as follows:

The specific flow of S3 is as follows:

s33: averaging the numbers in a single number list to form a list;

s35: composing most of the data into a two-dimensional table;

Because the bank flow is usually displayed in a two-dimensional spreadsheet form, aiming at the characteristic of the bank flow, the invention provides a method for dividing the picture according to the display characteristics of the bank flow picture, cutting the picture into separate small pictures according to the subjects (such as transaction date, transaction amount, balance, remark and the like) and the transaction records of the bank flow, wherein each small picture only comprises one data item. And finally, according to the contents of the data items of different types, performing targeted picture processing and recognition. For example, for data item pictures of categories such as transaction amount, balance, transaction date and the like, the picture recognition result range can be narrowed (the recognition result can only be a number), and the picture recognition machine learning model training can be performed in a targeted manner, so that the recognition rate of key data item contents can be improved.

If the data item content of the bank flow picture is to be subjected to two-dimensional segmentation, the following processing needs to be carried out on the picture: first, the picture needs to be rotated so that the picture remains substantially horizontal according to the content. Due to the possibility of skew of the bank serial scanning pictures, if the pictures are cut according to the coordinates, the pictures must be ensured to be basically horizontal (step S2 in FIG. 1); secondly, acquiring the abscissa and the ordinate of the two-dimensional table of the contents of the bank flow picture after the picture is basically horizontal (step S3 in FIG. 1); thirdly, dividing the picture according to the abscissa and the ordinate to ensure that each data item is a small picture (step S4 in fig. 1); and finally, identifying the picture contents one by one according to the divided small pictures, organizing the identified contents into form data in a text form, and finishing data identification. Since the content type of the divided small picture is single in this step, measures for improving the identification accuracy rate can be taken for specific content (step S5 in fig. 1).

In step S2, if a line needs to be found in the picture as a reference to rotate the picture, the entire content is kept horizontal by rotating the reference line to be close to horizontal. The spinning picture is divided into the following two cases: case one is a two-dimensional table banking pipeline, as shown in fig. 2. Case two is a bank pipeline without a two-dimensional table, as shown in fig. 3.

In the first case, horizontal line segments need to be searched in the picture, and due to problems of bank pipelining, scanning quality and the like, a plurality of short line segments can be found on the way, and the short line segments are combined to form all the horizontal line segments in the two-dimensional table. From these segments, the segments with y coordinates closer to the minimum y coordinate (less than 20 pixels apart) are searched, and these segments are the first segments in the two-dimensional table. In the first line segment, the coordinates of the two points of the leftmost line segment are found as (x1, y1) and (x2, y2), and the two coordinates of the rightmost line segment are found as (x3, y3) and (x4, y 4). Taking (x1, y1) and (x4, y4) as the start point coordinates and the end point coordinates of the two-dimensional table of the data of the bank water picture, the line is the reference line of the rotated picture. The slope (tangent value of the line segment) is calculated from this line segment coordinate: rate = (y4-y1)/(x4-x 1). If the absolute value of the slope is large (taking larger than 0.005), the picture needs to be rotated. And rotating by using the central point of the picture as an origin and an angle obtained by calculating the slope rate. The absolute value of the slope of the first line is repeatedly acquired, calculated and verified until the condition of the base level is satisfied.

In the second case, there is no two-dimensional table of data in the picture to be analyzed, so that rotation adjustment needs to be performed with reference to the arrangement of the contents of the data items. First, operations such as binarization, expansion, erosion, inversion and the like are performed on the picture, so that the part of the picture with the data content is highlighted as shown in fig. 4. In fig. 4, a rectangle with a relatively small area can be found. The data items in the picture can be located according to the coordinates of these rectangles (location is shown in fig. 5). The rectangles of the data item table can be located according to the approximate position information of the picture where the data item is located from all the rectangles in fig. 5. The coordinates of the position rectangle of the first row are selected and the picture is rotated in a similar way as in case 1, in such a way that the picture data items remain substantially horizontal.

The step S3 is the most critical step, and requires acquiring the abscissa and ordinate of the two-dimensional content division of the picture. Since the picture has been rotated in step S1 to ensure that the data content remains substantially horizontal, acquiring the coordinates of the rectangle containing the smallest unit data item ensures that the separator picture of step S4 is correct.

In the first case, the coordinates of all the vertical lines in the picture are still obtained by the method in step S2. The vertical coordinates of all points in the vertical line (the starting and ending points of the line) are stored in a list and sorted, an example is as follows: [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 40, 40, 41, 41, 42, 42, 43, 43, 131, 131, 132, 132, 133, 133, 297, 297, 298, 298, 299, 299, 300, 300, 422, 422, 423, 423, 424, 424, 612, 612, 613, 613, 614, 614, 615, 615, 615, 615, 1438, 1438, 1439, 1439, 1440, 1440, 1441, 1441, 1741, 1741, 1741, 1741, 1742, 1742, 1743, 1743, 2013, 2013, 2014, 2014, 2015, 2015, 2016, 2016, 2283, 2283, 2284, 2284, 2285, 2285, 2554, 2554, 2555, 2555, 2556, 2556, 2557, 2557, 2684, 2684, 2685, 2685, 2686, 2686, 2687, 2687, 2688, 2688, 2776, 2776, 2777, 2777, 2778, 2778, 2779, 2779, 2780, 2780, 2781, 2781, 2782, 2782, 2783, 2783, 2784, 2784].

The numbers with relatively close sizes are gathered together to form a number list, and all the number lists form a list of the number lists, for example, as follows:

[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8]；

[40, 40, 41, 41, 42, 42, 43, 43]；

[131, 131, 132, 132, 133, 133]；

[297, 297, 298, 298, 299, 299, 300, 300]；

[422, 422, 423, 423, 424, 424]；

[612, 612, 613, 613, 614, 614, 615, 615, 615, 615]；

[1438, 1438, 1439, 1439, 1440, 1440, 1441, 1441]；

[1741, 1741, 1741, 1741, 1742, 1742, 1743, 1743]；

[2013, 2013, 2014, 2014, 2015, 2015, 2016, 2016]；

[2283, 2283, 2284, 2284, 2285, 2285]；

[2554, 2554, 2555, 2555, 2556, 2556, 2557, 2557]；

[2684, 2684, 2685, 2685, 2686, 2686, 2687, 2687, 2688, 2688]；

[2776, 2776, 2777, 2777, 2778, 2778, 2779, 2779, 2780, 2780, 2781, 2781, 2782, 2782, 2783, 2783, 2784, 2784]。

the numbers in the list of single numbers are averaged to form a list, as shown in the following example: [4, 42, 132, 298, 423, 614, 1440, 1742, 2014, 2284, 2556, 2686, 2780].

Meanwhile, it should be noted that the vertical line near the edge of the picture is not a line segment of the two-dimensional table, but a paper edge line, and therefore, the coordinate data near the edge of the picture needs to be excluded, and the following results are obtained: [42, 132, 298, 423, 614, 1440, 1742, 2014, 2284, 2556, 2686]. The resulting data is the abscissa of the two-dimensional table in the picture.

In the same way, the ordinate of the two-dimensional table in the picture can also be acquired.

Through the above operation, the abscissa and the ordinate of the picture two-dimensional table are obtained in the S3 step.

In case two: the data items are segmented by finding no line segments of the two-dimensional table in the graph. But the rectangular coordinates (marked in fig. 5) of the vast majority of the data item content in the picture have been acquired in step S2. The key data item contents (transaction record data) constitute a two-dimensional table. And excluding the non-transaction data items according to the positions of the key data item contents in the pictures, and acquiring the rectangular coordinates of the key data item contents. According to the distribution condition of the coordinate data, the aggregation rule of the data item coordinates of the same type, which is similar to the condition, can still be observed. According to this aggregation rule, the abscissa and ordinate of the content of the divided data item are acquired.

In step S4, the picture needs to be divided. And D, drawing a two-dimensional grid in the picture according to the abscissa and the ordinate obtained in the step two, and cutting the picture according to the abscissa and the ordinate in the two-dimensional table to obtain a plurality of corresponding data pictures, wherein each picture corresponds to a single data item.

In step S5, the picture corresponding to each data item is processed and recognized, and in this step, the recognition result range of some key data items (such as money amount, transaction date, etc.) can be narrowed, so as to improve the accuracy of OCR picture recognition. The picture can also be identified by a plurality of methods, so that the result is cross-validated, and the identification accuracy can be improved to a great extent.

Claims

1. A method for improving the accuracy of bank pipelining recognition is characterized in that: the specific process is as follows:

2. The method for improving accuracy of bank pipelining recognition according to claim 1, wherein: the specific process of S2 is as follows:

3. The method for improving accuracy of bank pipelining recognition according to claim 1, wherein: the specific process of S3 is as follows:

s33: averaging the numbers in a single number list to form a list;

s35: composing most of the data into a two-dimensional table;

4. The method for improving accuracy of bank pipelining recognition according to claim 1, wherein: in step S4, the picture is divided, a two-dimensional grid is drawn in the picture according to the abscissa and ordinate obtained in step S3, and the picture is cut according to the abscissa and ordinate in the two-dimensional table to obtain a plurality of corresponding data pictures, each picture corresponding to a single data item.