CN116704531A

CN116704531A - Financial statement detection method based on lightweight YOLO model

Info

Publication number: CN116704531A
Application number: CN202310280647.3A
Authority: CN
Inventors: 杨玉东; 赵爽; 桂东昫; 任昊; 田庆阳
Original assignee: CHANGCHUN WHY-E SCIENCE AND TECHNOLOGY CO LTD
Current assignee: CHANGCHUN WHY-E SCIENCE AND TECHNOLOGY CO LTD
Priority date: 2023-03-22
Filing date: 2023-03-22
Publication date: 2023-09-05

Abstract

A financial statement detection method based on a lightweight YOLO model belongs to the field of image recognition and comprises the following steps: step one, data preprocessing; step two, model training; step three, saving the model to obtain a predicted coordinate; step four, cutting and blocking; step five, open-cv inspection; step six, line identification processing; step seven, classifying the financial statement image to be detected; and step eight, OCR recognition. The invention maintains the original format of the financial statement while extracting the text information, reduces the subsequent manpower consumption and improves the working efficiency. The invention can intelligently and automatically extract the table and can be stored as an electronic data document required by a financial system. The method uses the YOLO lightweight model as a core recognition algorithm, and has the advantages of high detection precision, high recognition speed and high practicability. The invention has certain compatibility and can carry out intelligent identification on various financial reports.

Description

Financial statement detection method based on lightweight YOLO model

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a financial statement detection method based on a lightweight YOLO model.

Background

OCR (Optical Character Recognition ) is a technique in which characters in a paper document are optically converted into a black-and-white dot matrix image file by a pointer pair print character, and characters in the image file are converted into a text format by recognition software and sent to word processing software for further editing processing. At present, the extraction of the financial statement image data is a difficulty of OCR technology all the time, the conventional OCR recognition algorithm can only extract the text information in the image, but cannot extract the form, and cannot recognize the original format of the form and the text information, and the text information needs to be filled into the corresponding position by manpower later, so that the working efficiency is greatly reduced. In addition, the situation that the rows and columns are misplaced often occurs in the prior art of extracting the financial statement data through a conventional OCR (optical character recognition) algorithm.

Disclosure of Invention

The invention provides a financial statement detection method based on a lightweight YOLO model, which aims to solve the problems that the prior method for extracting financial statement data by utilizing an OCR (optical character recognition) algorithm cannot extract forms, cannot recognize original formats of forms and text information and has low working efficiency.

The technical scheme adopted by the invention for solving the technical problems is as follows:

the invention discloses a financial statement detection method based on a lightweight YOLO model, which comprises the following steps of:

firstly, carrying out image preprocessing on a financial statement image;

step two, taking the image-preprocessed financial statement image as a data set, and training the data set by using a YOLO V4-tiny model; selecting a financial statement image to be detected for testing, and checking whether the identification of the form position in the financial statement image to be detected is accurate or not and whether the condition of missed detection and false detection exists or not;

step three, storing and calling a YOLO V4-tiny model, and transmitting a to-be-detected financial statement image into the trained YOLO V4-tiny model for prediction to obtain a prediction coordinate;

cutting and blocking the target area of the financial statement image to be detected by using the predicted coordinates obtained in the step three, and naming the images after cutting and blocking as a loop 1 and a loop 2 … … CropN according to the sequence, and recording the corresponding lower right corner coordinates of the images;

fifthly, checking the financial statement image to be detected by using open-cv;

step six, extracting the horizontal line and the vertical line of the table in the financial statement image to be detected, calculating the intersection point coordinates of the horizontal line and the vertical line in the table by utilizing image subtraction operation, comparing the intersection point coordinates of the horizontal line and the vertical line in the table with the right lower corner coordinates of each image in the step four, and judging whether the table position in the image after cutting and blocking is correct or not;

step seven, dividing the financial statement image to be detected into: a rule form, a many-to-one form, and a three-wire form;

and step eight, according to the type of the financial statement image to be detected, which is judged in the step seven, the images cut and blocked in the step four are respectively subjected to character recognition by utilizing an OCR recognition algorithm, and the recognized character information is sequentially output according to the coordinates in the step four.

Further, the specific operation steps of the first step are as follows:

and marking various types of cells in the financial statement image by using a data marking tool, wherein the marking number of the financial statement image is 500-1000, and each type of cell is at least correspondingly marked with 150 financial statement images.

In the second step, the method for reading the financial statement image by the YOLO V4-tiny model is to read a gray scale image in a single channel.

In the second step, the data set is divided into a training set and a testing set, and the ratio of the training set to the testing set is 3:7.

In the second step, 10-15 financial statement images to be detected are selected for testing after training is finished.

Further, in the fourth step, the dictionary format of the image information and the coordinate information is { 'Crop (name of the image to be cut)' [ lower right corner coordinate information ]]' s; the i coordinates of the same row are the same, the j coordinates of the same column are the same, and if the loop 1 and the loop 2 are the same and adjacent, the loop 1->[i _m ,j _n ]，Crop2-->[i _m ,j _n+1 ]。

In the fifth step, binarization processing is performed on the financial statement image to be detected, if noise exists, expansion operation or corrosion operation is performed for 1-3 times respectively, and image definition is improved.

In the sixth step, the coordinates of the intersection points of the horizontal lines and the vertical lines in the table are compared with the coordinates of the right lower corner of each image in the fourth step, and if the xy axis coordinate difference is +/-10 pixel points, the position of the table in the image after cutting and blocking is proved to be correct; if the xy-axis coordinate difference is greater than 10 pixel points, firstly identifying according to the image sequence and outputting the result to an abnormal file, and then manually checking is needed.

Further, the specific operation steps of the step eight are as follows:

if the table is a regular table, the output is in a DataFrame format, and the content is { ' Crop1' (recognized text information) ' [ Crop2 (recognized text information), crop3 (recognized text information) … … ] }, when the y-axis coordinate of the intersection point changes, a new element DataFrame { ' Crop1' [ Crop2, crop3 … … ], ' Crop4': [ Crop5, crop6 … … ] } is stored on the original DataFrame format; outputting the checked data frame format content, transposing, restoring the original financial statement format, and storing the original financial statement format into the required financial statement format;

if the table is a many-to-one type table, extracting according to column comparison, extracting first column content, judging according to y-axis coordinates of adjacent columns,within the y-axis coordinate range of the first column, the second column contents exist in the cell y ₁ ,y ₂ …y _n Then consider the cell y ₁ ,y ₂ …y _n Is corresponding to the cell y;

if the form is a three-line form, the form is written in sequence according to the image naming sequence after cutting and blocking.

The beneficial effects of the invention are as follows:

1. according to the financial statement detection method based on the lightweight YOLO model, the original format of the financial statement is reserved while the text information is extracted, the subsequent manpower consumption is reduced, and the working efficiency is improved.

2. The financial statement detection method based on the lightweight YOLO model can intelligently and automatically extract the table and can be stored as an electronic data document required by a financial system.

3. According to the financial statement detection method based on the lightweight YOLO model, the YOLO lightweight model (YOLO V4-tiny model) is used as a core recognition algorithm, so that the detection accuracy is high, the recognition speed is high, and the practicability is high.

4. The financial statement detection method based on the lightweight YOLO model has certain compatibility and can be used for intelligently identifying various financial statements.

Drawings

FIG. 1 is a flow chart of a financial statement detection method based on a lightweight YOLO model of the present invention.

FIG. 2 is a standard raw image of a financial statement.

FIG. 3 is a rule for marking various types of cells in a financial statement image using a data annotation tool. In fig. 3, a is a standard cell marking rule of the original image of the financial statement, B is a cell marking rule of the original image of the financial statement corresponding to a plurality of rows and a row, and C is a cell marking rule of the three-wire table.

Fig. 4 is a cut-and-block example.

Fig. 5 is a result of the image binarization processing.

Fig. 6 is a line recognition result. In fig. 6, a is a horizontal line recognition result in the table, and B is a vertical line recognition result in the table.

Fig. 7 is a regular table.

Fig. 8 is a many-to-one table.

Fig. 9 is a many-to-one range identification rule.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

As shown in FIG. 1, the invention relates to a financial statement detection method based on a lightweight YOLO model, which specifically comprises the following steps:

step one, data preprocessing;

image preprocessing is carried out on the financial statement image, and especially, image correction is carried out on the distorted image.

The specific operation steps are as follows:

The standard original image of the financial statement is shown in figure 2. For fig. 2, the rows and columns are mutually corresponding, which is the case for a rule-type table, where the first row needs to be marked from left to right, the second row from left to right, and so on, as shown in fig. 3 a. In addition, other types of financial statement primary images exist, as shown in a figure 3B, and a plurality of rows of financial statement primary images corresponding to one row belong to a many-to-one type form, and each block is required to be marked; as shown in fig. 3C, a three-line type table is provided, and the case where the table lines are absent is marked mainly with letters as target areas.

Step two, model training;

and (3) taking the preprocessed financial statement image in the step one as a data set, training the data set by using a YOLO V4-tiny model, wherein the method for reading the financial statement image by using the YOLO V4-tiny model is to read a gray level map in a single channel, and the ratio of the training set to the testing set is about 3:7. In order to verify the performance of the YOLO V4-tiny model, 10-15 to-be-detected financial statement images are selected for testing after training is finished, whether the identification of the form positions in the to-be-detected financial statement images is accurate or not is checked, and whether the condition of missed detection and false detection exists is checked.

If the total error rate of the table position difference (text information is not consistent with the position of the cell), omission (cell exists but no cell is detected), false detection (non-cell position is detected as a cell or a plurality of cells are detected as one cell) is greater than 2%, the YOLO V4-tiny model needs to be continuously trained until the total error rate is less than 2%.

Step three, saving the model to obtain a predicted coordinate;

and (3) storing and calling a YOLO V4-tiny model, and transmitting the to-be-detected financial statement image into the trained YOLOV4-tiny model for prediction to obtain a prediction coordinate.

Step four, cutting and blocking;

and (3) cutting and blocking the target area of the financial statement image to be detected by using the prediction coordinates obtained in the step (III), wherein the cutting and blocking effect is shown in figure 4. The images after cutting and partitioning are named as loop 1 and loop 2 … … CropN in sequence, corresponding image right lower corner coordinates are recorded, and the corresponding image information and coordinate information dictionary format is { 'loop (cut image name)' [ right lower corner coordinate information ]]'}. The i coordinates of the same row are the same, the j coordinates of the same column are the same, and if the loop 1 and the loop 2 are the same and adjacent, the loop 1->[i _m ,j _n ]，Crop2-->[i _m ,j _n+1 ]。

Step five, open-cv inspection;

and (3) checking the to-be-detected financial statement image by using open-cv (Open Source ComputerVisionLibrary, open-source computer vision library), and firstly performing binarization processing on the to-be-detected financial statement image, wherein the processing effect is as shown in fig. 5, and if noise exists after the to-be-detected financial statement image is subjected to binarization processing, performing expansion or corrosion operation for 1-3 times respectively, so that the image definition is improved.

Step six, line identification processing;

after the fifth step, extracting the transverse lines and the vertical lines of the form in the to-be-detected financial statement image, and calculating the intersection point coordinates of the transverse lines and the vertical lines in the form by utilizing image subtraction operation, wherein the transverse line identification result in the form is shown as A in FIG. 6, and the vertical line identification result in the form is shown as B in FIG. 6; comparing the coordinates of the intersection points of the horizontal lines and the vertical lines in the table with the coordinates of the right lower corner of each image in the fourth step, if the xy axis coordinate difference is +/-10 pixel points, the position of the table in the image after cutting and blocking is proved to be correct, if the xy axis coordinate difference is more than 10 pixel points, the result is firstly identified according to the image sequence and output to an abnormal file, and then manual inspection is needed.

Step seven, classifying the financial statement image to be detected;

classifying the financial statement images to be detected according to the line recognition processing result and the intersection point rule of the horizontal line and the vertical line in the table, and dividing the financial statement images to be detected into: a regular form (the intersection of the horizontal and vertical lines in the form being at or near the same spacing) as shown in fig. 7; many-to-one type of table (there is a large difference in the intersection of the horizontal and vertical lines in the table), as shown in fig. 8; three-line form (no intersection of horizontal and vertical lines in the form).

Step eight, OCR recognition;

and C, respectively carrying out character recognition on the images cut and segmented in the fourth step by utilizing an OCR recognition algorithm, and outputting the recognized character information according to the coordinates in the fourth step. The specific operation steps are as follows:

and D, according to the type of the financial report image to be detected, which is judged in the step seven, if the type is a regular form, outputting the data in a data frame format, wherein the content is { 'Crop1 (recognized text information)' [ Crop2 (recognized text information), crop3 (recognized text information) … … ] }, and when the y-axis coordinate of the intersection point changes, storing new elements of the data frame { 'Crop1': [ Crop2, crop3 … … ], 'Crop4': [ Crop5, crop6 … … ] } on the original data frame format. Outputting the checked content of the DataFrame format, transposing, restoring the original financial statement format, and storing the original financial statement format into the required financial statement format.

If the table is a many-to-one type table, extracting according to column comparison, extracting first column content, judging according to y-axis coordinates of adjacent columns, and in the y-axis coordinate range of the first column, storing second column content in a cell y ₁ ,y ₂ …y _n Then consider the cell y ₁ ,y ₂ …y _n Is in correspondence with the cell y as shown in fig. 9.

According to the financial statement detection method based on the lightweight YOLO model, the financial statement information can be intelligently extracted. The invention uses the deep learning principle, takes the form as a target to identify, uses the OCR technology and the machine learning algorithm to extract the form content in the financial report image, and can be converted into the electronic data document. The invention can provide a powerful tool for financial business informatization and can improve the working efficiency of financial management.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A financial statement detection method based on a lightweight YOLO model is characterized by comprising the following steps:

firstly, carrying out image preprocessing on a financial statement image;

step two, taking the image-preprocessed financial statement image as a data set, and training the data set by using a YOLOV4-tiny model; selecting a financial statement image to be detected for testing, and checking whether the identification of the form position in the financial statement image to be detected is accurate or not and whether the condition of missed detection and false detection exists or not;

step three, saving and calling a YOLOV4-tiny model, and transmitting a to-be-detected financial statement image into the trained YOLOV4-tiny model for prediction to obtain a prediction coordinate;

2. The method for detecting the financial statement based on the lightweight YOLO model as claimed in claim 1, wherein the specific operation steps of the first step are as follows:

3. The method for detecting the financial statement based on the lightweight YOLO model according to claim 1, wherein in the second step, the method for reading the financial statement image by the YOLO V4-tiny model is a single-channel gray scale image reading method.

4. The method for detecting the financial statement based on the lightweight YOLO model according to claim 1, wherein in the second step, the data set is divided into a training set and a testing set, and the ratio of the training set to the testing set is 3:7.

5. The method for detecting the financial statement based on the lightweight YOLO model according to claim 1, wherein in the second step, 10-15 financial statement images to be detected are selected for testing after training is finished.

6. The method for detecting a financial statement based on a lightweight YOLO model according to claim 1, wherein in the fourth step, the format of the dictionary of image information and coordinate information is { 'loop (cropped image name)' [ lower right corner coordinate information ]]' s; the i coordinates of the same row are the same, the j coordinates of the same column are the same, and if the loop 1 and the loop 2 are the same and adjacent, the loop 1->[i _m ,j _n ]，Crop2-->[i _m ,j _n+1 ]。

7. The method for detecting the financial statement based on the lightweight YOLO model according to claim 1, wherein in the fifth step, binarization processing is performed on the financial statement image to be detected, and if noise exists, 1-3 times of expansion operation or corrosion operation are performed respectively, so that the image definition is improved.

8. The financial statement detection method based on the lightweight YOLO model according to claim 1, wherein in the sixth step, the coordinates of the intersection points of the horizontal line and the vertical line in the table are compared with the coordinates of the right lower corner of each image in the fourth step, and if the xy axis coordinate difference is within + -10 pixel points, the position of the table in the image after cutting and blocking is proved to be correct; if the xy-axis coordinate difference is greater than 10 pixel points, firstly identifying according to the image sequence and outputting the result to an abnormal file, and then manually checking is needed.

9. The method for detecting the financial statement based on the lightweight YOLO model as claimed in claim 1, wherein the specific operation steps of the step eight are as follows:

if the table is a regular table, the output is in a DataFrame format, the content is { 'Crop1 (recognized text information)' [ Crop2 (recognized text information), crop3 (recognized text information) … … ] }, when the y-axis coordinate of the intersection point is changed, a new element DataFrame { 'Crop1' [ Crop2, crop3 … … ], 'Crop4': [ Crop5, crop6 … … ] } is stored on the original DataFrame format; outputting the checked data frame format content, transposing, restoring the original financial statement format, and storing the original financial statement format into the required financial statement format;

if the table is a many-to-one type table, extracting according to column comparison, extracting first column content, judging according to y-axis coordinates of adjacent columns, and in the y-axis coordinate range of the first column, storing second column content in a cell y ₁ ,y ₂ …y _n Then consider the cell y ₁ ,y ₂ …y _n Is corresponding to the cell y;