WO2022236875A1

WO2022236875A1 - File scanning method, device, medium and product

Info

Publication number: WO2022236875A1
Application number: PCT/CN2021/095960
Authority: WO
Inventors: 刘光禄; 杨旭; 段无悔; 张守龙
Original assignee: 广州广电运通金融电子股份有限公司
Priority date: 2021-05-14
Filing date: 2021-05-26
Publication date: 2022-11-17
Also published as: CN113286053A; CN113286053B

Abstract

A file scanning method, comprising: receiving a target image, extracting contour features, first filtering processing, overlapping processing, second filtering processing, calculating distance feature values, calculating one-dimensional head-up feature values, facing determination, and rotation correction. The described method achieves the orderly storage of the target image, manual intervention is not needed for the entire process, the position of a file in a scanning apparatus does not need to be checked when a user scans the file, a file to be scanned can be placed into the scanning apparatus in any direction, and manual intervention is not needed for a later stage, thereby greatly increasing the digital processing efficiency of a physical information image of existing files.

Description

A file scanning method, device, medium and product

technical field

The invention relates to the field of file scanning processing, in particular to a file scanning method, device, medium and product.

Background technique

With the development of the information digital age, information that is traditionally recorded in paper or physical materials is rapidly converted and stored with digital technology. For example, the scanning of valuable documents such as invoices, deposit receipts, and checks used in daily life, as well as the scanning and archiving of documents such as certification materials, certificates, and ID cards, are all digitally converting and storing information. In order to facilitate subsequent digital information management and improve production efficiency, it is one of the important prerequisites to store the relevant image information of physical document scanning in an orderly manner.

At present, when scanning documents to store image digital information, the preservation of documents usually requires the correct orientation of the scanned images, that is, to ensure that the documents are stored in an orderly manner with the front side at the front and the back side at the back. At present, in order to ensure that the image of the document scanned is correct, the following methods are usually used for document scanning: 1. In accordance with the characteristics of the machine, put the document to be scanned into the machine in a specific and correct direction and order for scanning; 2. Put the document in a free The direction is put into the machine for scanning, and then unified on the electronic device side in the later stage, and the image information is manually corrected and adjusted. Both of the above two operation methods require manual intervention, which greatly affects production efficiency.

Contents of the invention

In order to overcome the deficiencies of the prior art, one of the purposes of the present invention is to provide a document scanning method, which can solve the current problem of requiring manual intervention and greatly affecting production efficiency when scanning documents with headers.

The second object of the present invention is to provide an electronic device, which can solve the current problem of requiring manual intervention when scanning documents with headers, which greatly affects production efficiency.

The third object of the present invention is to provide a computer-readable storage medium, which can solve the current problem of requiring manual intervention when scanning documents with headers, which greatly affects production efficiency.

The fourth object of the present invention is to provide a computer program product, which can solve the problem that manual intervention is required when scanning documents with headers, which greatly affects production efficiency.

One of purpose of the present invention adopts following technical scheme to realize:

A file scanning method, the method is applied to the process of scanning a target file containing a header, comprising the following steps:

Receiving the target image, receiving the target image obtained by scanning the target file;

Extract contour features, perform contour feature extraction processing on the characters in the target image, and obtain a contour feature structured information set containing several contour feature structured information, each contour feature structured information includes area value and coordinate information , the coordinate information includes an abscissa value and a ordinate value;

The first filtering process is to filter all the contour feature structured information in the contour feature structured information set according to the pre-stored first feature filtering threshold, and use the contour feature structured information whose area value is greater than the first feature filtering threshold as the first Contour feature structured information and save;

Overlap processing, judging whether the first contour feature structured information overlaps in the projection area in the ordinate direction, and taking the first contour feature structured information that overlaps in the ordinate direction's projection area as the second contour feature structured information information and save;

The second filtering process is to filter all the first contour feature structured information according to the pre-stored second feature filtering threshold, and use the first contour feature structured information whose area value is greater than the second feature filtering threshold as the second contour feature structure information and save it;

Calculate the distance feature value, and calculate the one-dimensional distance feature value corresponding to the second contour feature structured information according to the pre-stored target image centerline ordinate value and the ordinate value in each of the second contour feature structured information;

Calculate the one-dimensional head-up feature value, and calculate the corresponding one-dimensional head-up feature value according to the area value and the one-dimensional distance feature value corresponding to each second contour feature structured information;

Facing the judgment, judging whether the ordinate value in the second contour feature structured information corresponding to the one-dimensional head-up feature value with the largest numerical value is greater than the target image midline ordinate value, if so, the target image is facing upside down, and the rotation correction step is performed, If not, the target image is facing upright, and the target image is output to the host computer for storage;

Rotation correction, rotate the target image by 180°, and output the target image rotated by 180° to the host computer for storage.

Further, before the first filtering processing step, it also includes calculating the first feature filtering threshold, sorting all the contour feature structured information in the contour feature structured information set according to the corresponding area value from large to small, and sorting all the contour feature structured information. The number of the contour feature structured information is used as the first quantity value, the first target number is calculated according to the first preset coefficient and the first quantity value, and the area in the contour feature structured information of the first target number The value is used as the first feature filtering threshold and stored.

Further, before the second filtering processing step, it also includes calculating the second feature filtering threshold, sorting the second contour feature structured information according to the corresponding area value from large to small, and sorting the second contour feature structured information The number of is used as the second quantity value, the second target serial number is calculated according to the second quantitative value and the second preset coefficient, and the area value in the contour feature structured information of the second target serial number is used as the second feature filtering threshold.

Further, before the step of extracting contour features, image preprocessing is also included, and binarization processing is performed on the target image.

Further, the image preprocessing specifically includes: performing binarization processing on the target image by using an average gray threshold method.

Further, a connected domain segmentation algorithm is used to perform contour feature extraction processing on the characters in the target image.

Further, the calculation of the one-dimensional head-up feature value specifically includes: taking the product obtained by multiplying the area value corresponding to each second contour feature structured information and the one-dimensional distance feature value as the corresponding one-dimensional head-up feature value.

Two of the purpose of the present invention adopts following technical scheme to realize:

An electronic device, comprising: a processor;

a memory; and a program, wherein the program is stored in the memory and configured to be executed by a processor, the program includes a method for performing a file scanning method described in this application.

Three of the purpose of the present invention adopts following technical scheme to realize:

A computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform a file scanning method described in this application.

Four of the purpose of the present invention adopts following technical scheme to realize:

A computer program product, including a computer program, is characterized in that, when the computer program is executed by a processor, a file scanning method described in this application is implemented.

Compared with the prior art, the beneficial effect of the present invention lies in that: a document scanning method in the present application, when scanning a target document containing a header, extracts contour features, first filter processing, and overlaps the target image. Processing, second filtering processing, and calculating the distance feature value realize the identification of the header of the target file, and then realize whether the corresponding information of the target file is correct or not according to the obtained one-dimensional header feature value and the ordinate value of the center line of the target image. Determine whether to perform automatic rotation correction processing on the target image according to the information orientation, and finally realize the orderly storage of the target image, the whole process does not require manual intervention, and when the user scans the document, there is no need to check the position of the document on the scanning device , the document to be scanned can be put into the scanning device in any direction, and there is no need for manual intervention in the later stage, which greatly improves the efficiency of digital processing of existing document physical information images.

The above description is only an overview of the technical solutions of the present invention. In order to understand the technical means of the present invention more clearly and implement them according to the contents of the description, the preferred embodiments of the present invention and accompanying drawings are described in detail below. The specific embodiment of the present invention is given in detail by the following examples and accompanying drawings.

Description of drawings

The accompanying drawings described here are used to provide a further understanding of the present invention and constitute a part of the application. The schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations to the present invention. In the attached picture:

FIG. 1 is a schematic flowchart of a file scanning method of the present invention.

Detailed ways

Below, the present invention will be further described in conjunction with the accompanying drawings and specific implementation methods. It should be noted that, under the premise of not conflicting, the various embodiments described below or the technical features can be combined arbitrarily to form new embodiments. .

The file scanning method in this embodiment is applied to scanning the target file containing header, as shown in Figure 1, specifically comprising the following steps:

The target image is received, and the target image obtained by scanning the target file is received. In this embodiment, the user puts the target document into the corresponding document storage place of the scanning device for scanning, and the scanning device in the scanning device scans the target document to obtain the target image, and at this time also obtains the target image according to the target image The overall width and height of the target image are calculated according to the width and height (ie length), and half of its height is used as the ordinate value of the target image center line and stored.

In image preprocessing, the target image is binarized using the average gray threshold method, and the width and height of the binarized target image remain unchanged.

Extract contour features, use the connected domain segmentation algorithm to perform contour feature extraction processing on the characters in the target image, and obtain a contour feature structured information set containing several contour feature structured information, each contour feature structured information includes Area value, coordinate information, width value and height value, the coordinate information includes abscissa value and ordinate value, and the coordinate information is based on any one of the four corners of the target image as the origin of the coordinate system, the target image The width is the abscissa, and the height of the target image is the coordinate point information in the coordinate system established by the ordinate. The abscissa value is the distance from the corresponding character to the abscissa, and the ordinate is the distance from the corresponding character to the ordinate.

Calculate the first feature filtering threshold, sort all the contour feature structured information in the contour feature structured information set from large to small according to the corresponding area value, that is, rank the contour feature structured information with the largest area value first, By analogy, sorting all the contour feature structured information, using the number of the contour feature structured information as the first quantity value, calculating the first target serial number according to the first preset coefficient and the first quantity value, The area value in the contour feature structured information of the first object serial number is used as the first feature filtering threshold and stored. In this embodiment, let the first target serial number be K, the first number be A_N, the first preset coefficient be α, K=α*A_N, where α∈(0,1), in this embodiment, α is preferably 0.15. The following example illustrates that if the first quantity value is 100 and the first preset coefficient is 0.15, then the first target serial number is 15, and the 15th-ranked contour feature is selected from the contour feature structure information sorted from large to small For the structure information, the area value sorted in the 15th contour feature structure information is used as the first feature filtering threshold and stored.

The first filtering process is to filter all the contour feature structured information in the contour feature structured information set according to the pre-stored first feature filtering threshold, and use the contour feature structured information whose area value is greater than the first feature filtering threshold as the first Contour feature structured information and saved.

Overlap processing, judging whether the first contour feature structured information overlaps in the projection area in the ordinate direction, and taking the first contour feature structured information that overlaps in the ordinate direction's projection area as the second contour feature structured information information and save it.

Calculate the second feature filtering threshold, sort the second contour feature structured information according to the corresponding area value from large to small, and use the quantity of the second contour feature structured information as the second quantity value, according to the second quantity value and The second preset coefficient is used to calculate the second target number, and the area value in the contour feature structured information of the second target number is used as the second feature filtering threshold. In this embodiment, let the second preset coefficient be αt, the target serial number be Kt, and the second quantity value be A_Nt, then Kt=αt*A_Nt, where αt∈(0,1], in this embodiment Preferred αt=0.5. The following examples illustrate that if the second quantity value is 50, and the first preset coefficient is 0.5, then the first target sequence number is 25, and is screened in the second profile feature structured information after sorting from large to small The 25th-ranked second contour feature structured information is obtained, and the area value in the 15th-ranked second contour feature structured information is used as the second feature filtering threshold and stored.

The second filtering process is to filter all the first contour feature structured information according to the pre-stored second feature filtering threshold, and use the first contour feature structured information whose area value is greater than the second feature filtering threshold as the second contour feature structure information and save it.

Calculate the distance feature value, and calculate the one-dimensional distance feature value corresponding to the second contour feature structured information according to the pre-stored target image center line ordinate value and the ordinate value in each of the second contour feature structured information. In this embodiment, the height of the target image can be obtained in the aforementioned step of receiving the target image, and its height can be used as the centerline ordinate value of the target image. If the height of the target image is H, then the centerline ordinate value of the target image is

Let the one-dimensional distance feature value be dj, the ordinate value in the second profile feature structured information is |Reg2[j].y, where j is the position number of the second profile feature structured information, j=0,1, ..., A_N2, then

Calculate the one-dimensional head-up feature value, and calculate the corresponding one-dimensional head-up feature value according to the area value and the one-dimensional distance feature value corresponding to each second contour feature structured information. The product obtained by multiplying the area value corresponding to each second contour feature structured information and the one-dimensional distance feature value is used as the corresponding one-dimensional head feature value.

Facing the judgment, judging whether the ordinate value in the second contour feature structured information corresponding to the one-dimensional head-up feature value with the largest numerical value is greater than the target image midline ordinate value, if so, the target image is facing upside down, then perform the rotation correction step, If not, the target image is facing upright, and the target image is output to the host computer for storage. In this embodiment, let the ordinate value in the second contour feature structured information corresponding to the one-dimensional head-up feature value with the largest value be Y, and the ordinate value of the center line in the target image be

judge

Whether it is true, and if it is true, the target image is facing upside down and needs to be corrected. If not established, the target image is facing upright, without correction, and the target image can be directly output to the host computer for storage as an image corresponding to the target file.

In this embodiment, an electronic device is also provided, including: a processor;

In this embodiment, there is also provided a computer-readable storage medium, on which a computer program is stored, and the computer program is used by a processor to execute a file scanning method described in this application.

In this embodiment, a computer program product includes a computer program, and is characterized in that, when the computer program is executed by a processor, a file scanning method described in this application is implemented.

A document scanning method in the present application, when scanning a target document containing headers, it is realized by extracting contour features, first filtering processing, overlapping processing, second filtering processing, and calculating distance feature values on the target image. Identify the header of the target file, and then judge whether the corresponding orientation information of the target file is correct according to the obtained one-dimensional header feature value and the vertical coordinate value of the target image center line, and determine whether the target image needs to be automatically rotated according to the orientation information The correction process finally realizes the orderly storage of the target image, the whole process does not require manual intervention, and when the user scans the document, there is no need to check the position of the document on the scanning device, and the document to be scanned can be placed in the scanning device in any direction In the device, and there is no need for manual intervention in the later stage, it greatly improves the efficiency of digital processing of existing document physical information images.

The above are only preferred embodiments of the present invention, and are not intended to limit the present invention in any form; all those skilled in the art can smoothly implement the present invention as shown in the accompanying drawings and above; however, all Those skilled in the art who make use of the technical content disclosed above without departing from the scope of the technical solution of the present invention, make some changes, modifications and equivalent changes of evolution are all equivalent embodiments of the present invention; meanwhile, Any equivalent changes, modifications and evolutions made to the above embodiments based on the substantive technology of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims

A file scanning method, the method is applied to scan the target file containing header, characterized in that: comprising the following steps:

Receiving the target image, receiving the target image obtained by scanning the target file;

Extract contour features, perform contour feature extraction processing on the characters in the target image, and obtain a contour feature structured information set containing several contour feature structured information, each contour feature structured information includes area value and coordinate information , the coordinate information includes an abscissa value and a ordinate value;

The first filtering process is to filter all the contour feature structured information in the contour feature structured information set according to the pre-stored first feature filtering threshold, and use the contour feature structured information whose area value is greater than the first feature filtering threshold as the first Contour feature structured information and save;

Overlap processing, judging whether the first contour feature structured information overlaps in the projection area in the ordinate direction, and taking the first contour feature structured information that overlaps in the ordinate direction's projection area as the second contour feature structured information information and save;

The second filtering process is to filter all the first contour feature structured information according to the pre-stored second feature filtering threshold, and use the first contour feature structured information whose area value is greater than the second feature filtering threshold as the second contour feature structure information and save it;

Calculate the distance feature value, and calculate the one-dimensional distance feature value corresponding to the second contour feature structured information according to the pre-stored target image centerline ordinate value and the ordinate value in each of the second contour feature structured information;

Calculate the one-dimensional head-up feature value, and calculate the corresponding one-dimensional head-up feature value according to the area value and the one-dimensional distance feature value corresponding to each second contour feature structured information;

Facing the judgment, judging whether the ordinate value in the second contour feature structured information corresponding to the one-dimensional head-up feature value with the largest numerical value is greater than the target image midline ordinate value, if so, the target image is facing upside down, and the rotation correction step is performed, If not, the target image is facing upright, and the target image is output to the host computer for storage;

Rotation correction, rotate the target image by 180°, and output the target image rotated by 180° to the host computer for storage.
A document scanning method as claimed in claim 1, characterized in that: before the first filtering processing step, it also includes calculating the first feature filtering threshold, and all the profile feature structured information in the profile feature structured information set according to The corresponding area values are sorted from large to small, and the number of the contour feature structured information is used as the first quantity value, and the first target serial number is calculated according to the first preset coefficient and the first quantity value, and the The area value in the contour feature structured information of the first object serial number is used as the first feature filtering threshold and stored.
A document scanning method as claimed in claim 1, characterized in that: before the second filtering processing step, it also includes calculating a second feature filtering threshold, and converting the second contour feature structured information according to the corresponding area value Large to small sorting process, using the quantity of the second contour feature structured information as the second quantity value, calculating the second target serial number according to the second quantity value and the second preset coefficient, and calculating the contour feature structure located in the second target serial number The area value in the transformation information is used as the second feature filtering threshold.
A document scanning method according to claim 1, characterized in that: before the step of extracting contour features, image preprocessing is further included, and binarization processing is performed on the target image.
The document scanning method according to claim 4, wherein the image preprocessing specifically comprises: performing binarization processing on the target image by using an average gray threshold method.
The document scanning method according to claim 1, characterized in that: using a connected domain segmentation algorithm to perform contour feature extraction processing on the characters in the target image.
The document scanning method according to claim 1, wherein the calculation of the one-dimensional header feature value is specifically: multiplying the area value corresponding to each second contour feature structured information and the one-dimensional distance feature value The obtained product is used as the corresponding one-dimensional head-up eigenvalue.
An electronic device, characterized by comprising: a processor;

memory; and a program, wherein the program is stored in the memory and is configured to be executed by a processor, the program includes a method for performing a file scanning according to any one of claims 1-7 .
A computer-readable storage medium, on which a computer program is stored, characterized in that: the computer program is executed by a processor according to the file scanning method described in any one of claims 1-7.
A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the file scanning method described in any one of claims 1-7 is implemented.