CN110097059A

CN110097059A - Based on file and picture binary coding method, system, the device for generating confrontation network

Info

Publication number: CN110097059A
Application number: CN201910222323.8A
Authority: CN
Inventors: 肖柏华; 赵晋媛; 贾馥溪; 王春恒
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2019-03-22
Filing date: 2019-03-22
Publication date: 2019-08-06
Anticipated expiration: 2039-03-22
Also published as: CN110097059B

Abstract

The invention belongs to field of image processings, more particularly to a kind of based on the file and picture binary coding method, system, the device that generate confrontation network, it is intended that it is unstable to solve existing binarization method its binaryzation accuracy in the case where the picture quality of document picture is irregular, the poor problem of robustness.The method of the present invention includes: to carry out cutting to original document image；Divide and binary conversion treatment is carried out to the original document image after cutting image, normalization respectively based on the first convolutional neural networks；Obtained binary image is passed through into splicing respectively, scaling generates original document image size, and it is merged with the grayscale image of original document image, pass through the second convolutional neural networks into binaryzation after carrying out picture cutting, and merges obtained binary image block and obtain final binary picture.Take pictures file and picture available accuracy higher binary image of the present invention for multiple types document, and stability with higher, strong robustness.

Description

Based on file and picture binary coding method, system, the device for generating confrontation network

Technical field

The invention belongs to field of image processings, and in particular to a kind of based on the document image binaryzation side for generating confrontation network Method, system, device.

Background technique

In recent years, with the fast development of network technology, the mankind have come into that information is epoch-making, traditional acquisition of information Method, such as books, newspaper and periodical be due to the inconvenience of carrying, while storing and needing a large amount of space, is not easy to compile It collects and arranges and propagate.People increasingly tend to store using electronic equipments such as disks, therefore by paper material text information It rapidly inputs computer to have very important significance, OCR (Optical Character Recognition, optical character identification) Thus technology generates.OCR technique can be realized the announcement high speed of text information, automatically input, save a large amount of human resources, It has been widely used at present.

The success of OCR technique depends on the pretreatment work to text image, can carry out good binaryzation to image Processing, it will be able to the accuracy rate of OCR identification is greatly improved, so binaryzation work has very big researching value.It is answered actual In, the quality of text image may be multifarious, may have unclear or noise of printing etc. to bother, existing binarization method exists Its binaryzation accuracy is unstable in the case that the picture quality of document picture is irregular, and robustness is poor.

Summary of the invention

In order to solve the above problem in the prior art, in order to solve existing binarization method in the image of document picture Its binaryzation accuracy is unstable in the case that quality is irregular, and the poor problem of robustness, the first aspect of the present invention mentions Go out a kind of file and picture binary coding method based on generation confrontation network, this method comprises:

Step S10 obtains the multiple images of default first size according to setting step-length from the original document image of input Block, as the first image block set；

Step S20 obtains the two-value of every image block by the first convolutional neural networks for the first image set of blocks Change figure, obtains the second image block set；The original document image is normalized to the first size size, passes through described One convolutional neural networks obtain its binary picture, as the first binary map；

Step S30 splices each image block in second image block set to obtain the second binary map；By the first binary map The size of the original document image is zoomed to as third binary map；Obtain the grayscale image of the original document image；By institute State the second binary map, third binary map, the original document image grayscale image merge to obtain triple channel image；

Step S40, the triple channel image obtains third image block set using the method cutting of step S10, and passes through Second convolutional neural networks obtain the binary map of an image block, as the 4th image block set；

Each image block in 4th image block set is spliced the final two-value for obtaining original document image by step S50 Change figure；

Wherein, first convolutional neural networks, second convolutional neural networks cascade composition generate confrontation network Generator, and parameter optimization is carried out by training.

In some preferred embodiments, the arbiter of the confrontation network is the full convolutional neural networks of patch-based；

First convolutional neural networks, second convolutional neural networks are the identical semantic segmentation net of two structures Network；First convolutional neural networks are used to generate binary image according to the contextual information of regional area；The volume Two Product neural network be used for according to text and context information gap to the output results of first convolutional neural networks into Row amendment.

In some preferred embodiments, it is described confrontation network training when loss function L_lossFor

L_cGAN(G, D)=E_{X, y}[log D (x, y)]+E_x[log (1-D (x, G (x, z)))]

L_L1(G)=E_{X, y}[| | (y-G (x, z)) | |₁]

Wherein, G, D respectively indicate the generator and arbiter in confrontation network；L_cGAN(G, D) is generator and arbiter Trained confrontation loss, L_L1(G) L1 of the image and true bianry image generated for generator loses, and x is input picture, and z is Random noise in generator, G (x, z) indicate the binarization result figure that generator is generated using input picture x and random noise z Picture, y are true bianry image, and γ is the corresponding weight coefficient of two kinds of losses, and D (x, y) is by input picture and true two-value Change the corresponding arbiter of sample and exports result.

In some preferred embodiments, first convolutional neural networks, second convolutional neural networks include Five layers of convolutional layer, five layers of warp lamination.

In some preferred embodiments, each image block in the first image set of blocks, the of picture centre Two size areas are not Chong Die with other image blocks in the first image set of blocks.

In some preferred embodiments, the first size is A*A, and described second having a size of B*B；

Upper left point [a, b] image block based determines the upper left point of four adjacent image blocks of the image block, method are as follows:

The upper left point coordinate of left side adjacent image block is [a-A+ (B/2), b]；

The upper left point coordinate of right side adjacent image block is [a+A- (B/2), b]；

The upper left point coordinate of top adjacent image block is [a, b-A+ (B/2)]；

The upper left point coordinate of lower section adjacent image block is [a, b+A- (B/2)].

In some preferred embodiments, the first size is 256*256, and described second having a size of 128*128.

The second aspect of the present invention proposes a kind of document image binaryzation system based on generation confrontation network, this is It unites including at cutting module, the first convolution Processing with Neural Network module, triple channel image collection module, the second convolutional neural networks Manage module, final binary picture obtains module；

The cutting module is configured to obtain the more of default first size from the text image of input according to setting step-length A image block constructs image block set；

The first convolution Processing with Neural Network module, be configured to for by the cutting module from original document image The first image block set is obtained, the binary picture of every image block is obtained by the first convolutional neural networks, obtains the second image block Set；The original document image is normalized to the first size size, is obtained by first convolutional neural networks Its binary picture, as the first binary map；

The triple channel image collection module is configured to that each image block in second image block set is spliced to obtain Two binary maps；First binary map is zoomed into the size of the original document image as third binary map；It obtains described original The grayscale image of file and picture；The grayscale image of second binary map, third binary map, the original document image is merged to obtain Triple channel image；

The second convolution Processing with Neural Network module, be configured to for by the cutting module from the triple channel figure As obtaining third image block set, and by the binary map of the second convolutional neural networks acquisition image block, as the 4th image Set of blocks；

The final binary picture obtains module, is configured to splice to obtain by each image block in the 4th image block set The final binary picture of original document image；

The third aspect of the present invention proposes a kind of storage device, wherein be stored with a plurality of program, described program be suitable for by Processor loads and executes above-mentioned based on the file and picture binary coding method for generating confrontation network to realize.

The fourth aspect of the present invention proposes a kind of processing unit, including processor, storage device；Processor, suitable for holding Each program of row；Storage device is suitable for storing a plurality of program；Described program is suitable for being loaded by processor and being executed above-mentioned to realize Based on generate confrontation network file and picture binary coding method.

Beneficial effects of the present invention:

The present invention for multiple types document the higher binary image of the available accuracy of file and picture of taking pictures, and Stability with higher, strong robustness, meanwhile, the present invention mentions file and picture text by the way of double convolutional neural networks It takes with good adaptability, non-legible noise jamming can be overcome.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is an embodiment of the present invention based on the file and picture binary coding method process signal for generating confrontation network Figure；

Fig. 2 is original document image cutting schematic diagram in an embodiment of the present invention；

Fig. 3 is that generator partial structure diagram in confrontation network structure is generated in an embodiment of the present invention；

Fig. 4 is that arbiter structural schematic diagram in confrontation network structure is generated in an embodiment of the present invention；

Fig. 5 is the result example obtained in an embodiment of the present invention through the first convolutional neural networks；

Fig. 6 is the input picture example of the second convolutional neural networks in an embodiment of the present invention；

Fig. 7 is the final binary picture example that an embodiment of the present invention obtains original document image.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to the embodiment of the present invention In technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, without It is whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work Every other embodiment obtained is put, shall fall within the protection scope of the present invention.

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.

In order to which more clearly the present invention will be described, we are invented below with reference to Fig. 1-Fig. 7 each in a kind of embodiment Part carries out expansion detailed description.

Binary conversion treatment is carried out using two convolutional neural networks cascades in the present invention, in order to preferably carry out to the present invention Illustrate, hereafter describe the composition and training of two convolutional neural networks in advance, is then based on trained two convolutional Neurals again Network describes of the invention based on the file and picture binary coding method for generating confrontation network.

1, the composition and training of two convolutional neural networks

First convolutional neural networks, the second convolutional neural networks cascade composition generate the generator of confrontation network, and are based on This building confrontation network.

(1) generator

In designed generation confrontation network, the first convolutional neural networks, the second convolutional neural networks are by two structure phases Same semantic segmentation network (U-NET) cascades composition, wherein each U-NET network includes five layers of convolutional layer, five layers of deconvolution Layer, to guarantee that input and output picture size is identical.Two U-NET effects are respectively as follows: first U-NET structure mainly according to part The contextual information in region generates binary image, and keeps text details as much as possible.Second U-NET structure is based on not With the contextual information difference of text under scale and background, the result images generated to first part are corrected, with into one Step eliminates ambient noise.Generator structure is shown in the Blocked portion among Fig. 3, and G1 is the first convolutional neural networks in the figure, G2 is Second convolutional neural networks.

(2) arbiter

Arbiter is the full convolutional neural networks of patch-based.The purpose is to the two-value for distinguishing generator generation Which more standard of image and original binary image changed.Specific network structure is shown in Fig. 4, the binary picture that generator is generated Piece binaryzation picture sample corresponding with input sample compares judgement, wherein the binary map and input that generator generates are schemed As the result of comparison judgement be it is false, the comparison judging result of the corresponding standard binary map of original image and input picture is true.

(3) loss function

Fight loss function L when network training_lossFor

L_cGAN(G, D)=E_{X, y}[log D (x, y)]+E_x[log (1-D (x, G (x, z)))]

L_L1(G)=E_{X, y}[| | (y-G (x, z)) | |₁]

Wherein, G, D respectively indicate the generator and arbiter in confrontation network_；L_cGAN(G, D) is that generator and arbiter are instructed Experienced confrontation loss, L_L1(G) L1 of the image and true bianry image generated for generator loses；X is input picture；Z is raw Random noise in growing up to be a useful person；G (x, z) indicates the binarization result figure that generator is generated using input picture x and random noise z Picture, y is true bianry image, and γ is two kinds of corresponding weight coefficients (taking γ=1 in some embodiments) of loss, D (x, Y) for by input picture and the corresponding arbiter output result of true binaryzation sample.

2, the method for the present invention

The file and picture binary coding method based on generation confrontation network of an embodiment of the present invention, as shown in Figure 1, the party Method includes:

Step S10 obtains the multiple images of default first size according to setting step-length from the original document image of input Block, as the first image block set.

In the present embodiment, each image block in the first image block set, the second size area of picture centre not with Other image blocks are overlapped in first image block set.

For example, default first size is A*A (such as can be 256*256), second (such as can be having a size of B*B 128*128), original file and picture of taking pictures is cut into the image block of A*A size, the B* at each image block center according to a fixed step size B area is not overlapped, and is not overlapped to realize, the position of adjacent image block can be determined using lower method:

Upper left point [a, b] image block based determines the upper left point of four adjacent image blocks of the image block::

For example, first size is 256*256 in one embodiment, second having a size of 128*128, an image block upper left Point is [a, b], then the upper left point coordinate of corresponding left side adjacent image block is [a-256+64, b]；A left side for right side adjacent image block Upper coordinate is [a+256-64, b]；The upper left point coordinate of top adjacent image block is [a, b-256+64]；Lower section adjacent image The upper left point coordinate of block is [a, b+256-64].

It is illustrated in figure 2 an exemplary original document image cutting schematic diagram, the lines in figure indicate original document figure The corresponding relationship of image behind image position and cutting.

Step S20 obtains the two-value of every image block by the first convolutional neural networks for the first image set of blocks Change figure, obtains the second image block set；The original document image is normalized to the first size size, passes through described One convolutional neural networks obtain its binary picture, as the first binary map.

In the present embodiment, image block each in the first image block set is inputted into trained first convolutional neural networks, is obtained To the initial binary result images of each corresponding image block, the second image block set is obtained；Meanwhile by original document image Total normalized rate obtains its binaryzation by the first convolutional neural networks as a result, conduct to A*A (such as can be 256*256) First binary map.

Fig. 5 is that the binary image block process generated in an embodiment of the present invention through the first convolutional neural networks is spliced into Result example after original image size gives (a), (b), (c), (d), (e) five examples in the figure.

Step S30 splices each image block in second image block set to obtain the second binary map；By the first binary map The size of the original document image is zoomed to as third binary map；Obtain the grayscale image of the original document image；By institute State the second binary map, third binary map, the original document image grayscale image merge to obtain triple channel image.

In the present embodiment, the second convolutional neural networks input picture is made of three channels, therefore the step needs in advance Obtain triple channel image, method are as follows:

Each image block in the second image block set will be obtained in step S30, group is carried out using the information with step S10 cutting It is merged and connects, revert to the preliminary binarization result of original document image, as the second binary map, which is the second convolution nerve net First channel of network input picture；

The first binary map that step S30 is obtained is zoomed into the size of original document image as third binary map, the figure For second channel of the second convolutional neural networks input picture；

Obtain third channel of the grayscale image of original document image as the second convolutional neural networks input picture；

Merge the grayscale image of the second binary map, third binary map, original document image to obtain triple channel image.

Two triple channel example images being illustrated in figure 6.

Step S40, the triple channel image obtains third image block set using the method cutting of step S10, and passes through Second convolutional neural networks obtain the binary map of an image block, as the 4th image block set.

Each image block in 4th image block set is spliced the final two-value for obtaining original document image by step S50 Change figure.

In the present embodiment, each image block in the 4th image block set that step S40 is obtained, using with step S10 cutting Information be combined splicing, revert to the corresponding binarization result image of original document image, and using the image as original The final binary picture of file and picture.

Picture binaryzation process of the invention can also be shown in Fig. 3, input picture (original document image) passes through figure Image block set after obtaining cutting as cutting, normalized by scaling after original image, pass through gray proces The grayscale image of original document image is obtained, obtains figure after the bianry image merged block that the image block set after cutting is obtained by G1 Piece (1), the original image after normalization after G1 binaryzation by obtaining picture (2), picture (1), picture (2), original document figure The grayscale image of picture carries out picture cutting after merging again, obtains multiple binaryzation pictures by G2 later, obtains after merging final Binary picture.

Fig. 7 is the final binary picture example that an embodiment of the present invention obtains original document image, including (a), (b), (c), (d), (e) five result examples, in Fig. 5 respectively figure correspond to each other.

The document image binaryzation system based on generation confrontation network of an embodiment of the present invention, including cutting module, First convolution Processing with Neural Network module, triple channel image collection module, the second convolution Processing with Neural Network module, final two-value Change figure and obtains module.

The cutting module is configured to obtain the more of default first size from the text image of input according to setting step-length A image block constructs image block set.

The first convolution Processing with Neural Network module, be configured to for by the cutting module from original text image The first image block set is obtained, the binary picture of every image block is obtained by the first convolutional neural networks, obtains the second image block Set；The original text image is normalized to the first size size, is obtained by first convolutional neural networks Its binary picture, as the first binary map.

The triple channel image collection module is configured to that each image block in second image block set is spliced to obtain Two binary maps；First binary map is zoomed into the size of the original text image as third binary map；It obtains described original The grayscale image of text image；The grayscale image of second binary map, third binary map, the original text image is merged to obtain Triple channel image.

The second convolution Processing with Neural Network module, be configured to for by the cutting module from the triple channel figure As obtaining third image block set, and by the binary map of the second convolutional neural networks acquisition image block, as the 4th image Set of blocks.

The final binary picture obtains module, is configured to splice to obtain by each image block in the 4th image block set The final binary picture of original text image.

Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description The specific work process of system and related explanation, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.

It should be noted that it is provided by the above embodiment based on the document image binaryzation system for generating confrontation network, only The example of the division of the above functional modules, in practical applications, it can according to need and by above-mentioned function distribution Completed by different functional modules, i.e., by the embodiment of the present invention module or step decompose or combine again, for example, on The module for stating embodiment can be merged into a module, multiple submodule can also be further split into, to complete above description All or part of function.For module involved in the embodiment of the present invention, the title of step, it is only for distinguish each Module or step, are not intended as inappropriate limitation of the present invention.

The storage device of an embodiment of the present invention, wherein being stored with a plurality of program, described program is suitable for being added by processor It carries and executes and is above-mentioned based on the file and picture binary coding method for generating confrontation network to realize.

The processing unit of embodiment in the present invention one, including processor, storage device；Processor is adapted for carrying out each journey Sequence；Storage device is suitable for storing a plurality of program；Described program is suitable for being loaded by processor and being executed above-mentioned based on life to realize At the file and picture binary coding method of confrontation network.

Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description The specific work process and related explanation of storage device, processing unit, can refer to corresponding processes in the foregoing method embodiment, Details are not described herein.

Those skilled in the art should be able to recognize that, mould described in conjunction with the examples disclosed in the embodiments of the present disclosure Block, method and step, can be realized with electronic hardware, computer software, or a combination of the two, software module, method and step pair The program answered can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electric erasable and can compile Any other form of storage well known in journey ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field is situated between In matter.In order to clearly demonstrate the interchangeability of electronic hardware and software, in the above description according to function generally Describe each exemplary composition and step.These functions are executed actually with electronic hardware or software mode, depend on technology The specific application and design constraint of scheme.Those skilled in the art can carry out using distinct methods each specific application Realize described function, but such implementation should not be considered as beyond the scope of the present invention.

Term " first ", " second " etc. are to be used to distinguish similar objects, rather than be used to describe or indicate specific suitable Sequence or precedence.

Term " includes " or any other like term are intended to cover non-exclusive inclusion, so that including a system Process, method, article or equipment/device of column element not only includes those elements, but also including being not explicitly listed Other elements, or further include the intrinsic element of these process, method, article or equipment/devices.

So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, this field Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this Under the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, these Technical solution after change or replacement will fall within the scope of protection of the present invention.

Claims

1. a kind of based on the file and picture binary coding method for generating confrontation network, which is characterized in that this method comprises:

Step S10 obtains the multiple images block of default first size from the original document image of input according to setting step-length, makees For the first image block set；

Step S20 obtains the binaryzation of every image block by the first convolutional neural networks for the first image set of blocks Figure, obtains the second image block set；The original document image is normalized to the first size size, passes through described first Convolutional neural networks obtain its binary picture, as the first binary map；

Step S30 splices each image block in second image block set to obtain the second binary map；First binary map is scaled Extremely the size of the original document image is as third binary map；Obtain the grayscale image of the original document image；By described Two binary maps, third binary map, the original document image grayscale image merge to obtain triple channel image；

Each image block in 4th image block set is spliced the final binary picture for obtaining original document image by step S50；

Wherein, first convolutional neural networks, second convolutional neural networks cascade composition generate the generation of confrontation network Device, and parameter optimization is carried out by training.

2. according to claim 1 based on the file and picture binary coding method for generating confrontation network, which is characterized in that described The arbiter for fighting network is the full convolutional neural networks of patch-based；

First convolutional neural networks, second convolutional neural networks are the identical semantic segmentation network of two structures；Institute The first convolutional neural networks are stated for generating binary image according to the contextual information of regional area；Second convolutional Neural Network according to text and output result of the context information gap to first convolutional neural networks for being modified.

3. according to claim 2 based on the file and picture binary coding method for generating confrontation network, which is characterized in that described Fight loss function L when network training_lossFor

L_cGAN(G, D)=E_x,y[logD(x,y)]+E_x[log(1-D(x,G(x,z)))]

L_L1(G)=E_x,y[||(y-G(x,z))||₁]

Wherein, G, D respectively indicate the generator and arbiter in confrontation network；L_cGAN(G, D) is generator and arbiter training Confrontation loss, L_L1(G) L1 of the image and true bianry image generated for generator loses, and x is input picture, and z is generator In random noise, G (x, z) indicates that the binarization result image that generator is generated using input picture x and random noise z, y are True bianry image, γ are the corresponding weight coefficient of two kinds of losses, and D (x, y) is by input picture and true binaryzation sample Corresponding arbiter exports result.

4. according to claim 2 based on the file and picture binary coding method for generating confrontation network, which is characterized in that described First convolutional neural networks, second convolutional neural networks include five layers of convolutional layer, five layers of warp lamination.

5. according to claim 1-4 based on the file and picture binary coding method for generating confrontation network, feature Be, each image block in the first image set of blocks, the second size area of picture centre not with first figure As other image blocks overlapping in set of blocks.

6. according to claim 5 state based on the file and picture binary coding method for generating confrontation network, which is characterized in that described the One having a size of A*A, and described second having a size of B*B；

7. according to claim 6 state based on the file and picture binary coding method for generating confrontation network, which is characterized in that described the One having a size of 256*256, and described second having a size of 128*128.

8. it is a kind of based on generate confrontation network document image binaryzation system, which is characterized in that the system include cutting module, First convolution Processing with Neural Network module, triple channel image collection module, the second convolution Processing with Neural Network module, final two-value Change figure and obtains module；

The cutting module is configured to obtain multiple figures of default first size from the text image of input according to setting step-length As block, image block set is constructed；

The first convolution Processing with Neural Network module is configured to for being obtained by the cutting module from original document image First image block set obtains the binary picture of every image block by the first convolutional neural networks, obtains the second image block set； The original document image is normalized to the first size size, obtains its two-value by first convolutional neural networks Change figure, as the first binary map；

The triple channel image collection module is configured to splice each image block in second image block set to obtain the two or two Value figure；First binary map is zoomed into the size of the original document image as third binary map；Obtain the original document The grayscale image of image；Merge the grayscale image of second binary map, third binary map, the original document image to obtain threeway Road image；

The second convolution Processing with Neural Network module is configured to for being obtained by the cutting module from the triple channel image Third image block set is taken, and obtains the binary map of an image block by the second convolutional neural networks, as the 4th image block collection It closes；

The final binary picture obtains module, is configured to splice to obtain by each image block in the 4th image block set original The final binary picture of file and picture；

9. a kind of storage device, wherein being stored with a plurality of program, which is characterized in that described program is suitable for being loaded and being held by processor Row is to realize that claim 1-7 is described in any item based on the file and picture binary coding method for generating confrontation network.

10. a kind of processing unit, including processor, storage device；Processor is adapted for carrying out each program；Storage device is suitable for Store a plurality of program；It is characterized in that, described program is suitable for being loaded by processor and being executed to realize any one of claim 1-7 The file and picture binary coding method based on generation confrontation network.