CN106874906B

CN106874906B - Image binarization method and device and terminal

Info

Publication number: CN106874906B
Application number: CN201710031170.XA
Authority: CN
Inventors: 刘银松; 郭安泰
Original assignee: Tencent Technology Shanghai Co Ltd
Current assignee: Tencent Technology Shanghai Co Ltd
Priority date: 2017-01-17
Filing date: 2017-01-17
Publication date: 2023-02-28
Anticipated expiration: 2037-01-17
Also published as: WO2018133717A1; CN106874906A

Abstract

The invention discloses a binarization method and device for a picture and a terminal. The invention only needs to process the image to be processed independently through various binarization methods with strong complementarity, then obtains the confidence coefficient of a single character by using a learning engine based on optical character recognition, and further calculates the confidence coefficient of the character, thereby dynamically selecting the optimal processing result. Seamless switching of processing results of various binarization methods can be realized without concerning global information or local textures. The invention can dynamically select the optimal binarization result in different scenes so as to meet the diversity requirements of different scenes and realize full scene adaptation of picture binarization.

Description

Image binarization method and device and terminal

Technical Field

The present invention relates to the field of image processing, and in particular, to a method, an apparatus, and a terminal for binarizing a picture.

Background

The binarization of the image is to set the gray value of a pixel point on the image to be 0 or 255, so that the whole image has an obvious visual effect of only black and white. Binarization is the basic operation of image processing, and is widely applied, and correspondingly, there are also quite many binarization methods in the prior art, such as a bimodal method, a P parameter method, an iterative method, and a maximum inter-class variance method.

However, the existing binarization schemes binarize the picture based on the picture scene with fixed rules, and have no universality, so that no binarization method suitable for all scenes exists at present. Due to the diversity of the binarization methods and the limitation of each binarization method, when the images of various scenes need to be binarized, a proper binarization method is difficult to find quickly, so that the binarization effect of the images is influenced.

Disclosure of Invention

In order to solve the technical problem, the invention provides a binarization method and device for a picture and a terminal.

The invention is realized by the following technical scheme:

in a first aspect, a binarization method for a picture is provided, the method includes:

acquiring a picture to be processed, wherein the picture to be processed comprises characters;

respectively using a plurality of preset binarization processing methods to carry out independent binarization processing on the picture to be processed, wherein each binarization method obtains a processing result;

obtaining a processing result set according to the processing result;

calculating the character confidence of each processing result in the processing result set;

and selecting the processing result with the highest character confidence coefficient as the binarization result of the picture to be processed.

In a second aspect, there is provided a binarization device for a picture, the device comprising:

the image to be processed acquisition module is used for acquiring an image to be processed;

the processing result obtaining module is used for carrying out independent binarization processing on the picture to be processed by using a plurality of preset binarization processing methods respectively, and each binarization method obtains a processing result;

a processing result set obtaining module, configured to obtain a processing result set according to the processing result;

the character confidence coefficient calculation module is used for calculating the character confidence coefficient of each processing result in the processing result set;

and the binarization result obtaining module is used for selecting the processing result with the highest character confidence coefficient as the binarization result of the picture to be processed.

In a third aspect, a binarization terminal for a picture is provided, and the terminal comprises the binarization device for the picture.

The binarization method, the binarization device and the terminal for the picture have the following beneficial effects that:

the invention calculates the character confidence coefficient of the binarization result of the picture to be processed based on optical character recognition, and dynamically selects the optimal binarization method according to the character confidence coefficient, thereby obtaining the optimal binarization result of the picture to be processed. The invention can dynamically select the optimal binarization result in different scenes so as to meet the diversity requirements of different scenes and realize full scene adaptation of picture binarization.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a binarization method for a picture according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for obtaining confidence of characters according to an embodiment of the present invention;

FIG. 3 is a flow chart of a weighted average algorithm provided by an embodiment of the present invention;

FIG. 4 is a flowchart of a sliding window based binarization method according to an embodiment of the present invention;

FIG. 5 is a flowchart of a local binarization method provided by the embodiment of the invention;

fig. 6 is a flowchart of a binarization method based on color value statistics according to an embodiment of the present invention;

FIG. 7 is a block diagram of a convolutional neural network provided by an embodiment of the present invention;

FIG. 8 is a diagram of a pending picture according to an embodiment of the present invention;

fig. 9 is a processing result of the sliding window based binarization method according to the embodiment of the present invention for the picture to be processed in fig. 8;

fig. 10 is a processing result of the binarization method based on color value statistics according to the embodiment of the present invention for the to-be-processed picture in fig. 8;

FIG. 11 is another to-be-processed picture according to an embodiment of the present invention;

fig. 12 is a processing result of the sliding window based binarization method according to the embodiment of the present invention for the picture to be processed in fig. 11;

fig. 13 is a processing result of the binarization method based on color value statistics according to the embodiment of the present invention for the picture to be processed in fig. 11;

fig. 14 is a block diagram of a binarization device for a picture according to an embodiment of the present invention;

FIG. 15 is a block diagram of a text confidence calculation unit according to an embodiment of the present invention;

FIG. 16 is a block diagram of a process result derivation module provided by embodiments of the invention;

fig. 17 is a block diagram of a sliding window binarization unit provided by an embodiment of the present invention;

fig. 18 is a block diagram of a color value statistical binarization unit according to an embodiment of the present invention;

fig. 19 is a block diagram of a terminal according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flowchart of a binarization method of a picture according to an embodiment of the present invention is shown. The method may include the following steps.

Step 101, obtaining a picture to be processed, wherein the picture to be processed comprises characters.

And 102, carrying out independent binarization processing on the picture to be processed by using a plurality of preset binarization processing methods respectively, wherein each binarization method obtains a processing result.

The preset binarization method can select the existing binarization method, and the number of the preset binarization methods can be two or more than two.

The existing binarization methods are mainly divided into two types, one type is a global method, a uniform segmentation threshold value is determined according to a global visual angle, and binarization is carried out through the segmentation threshold value; the other type is a local self-adaptive method, different threshold values are determined according to the conditions of different areas of the image, and binarization is carried out according to the threshold values.

The global method mostly calculates a segmentation threshold value capable of obtaining the maximum binarization effect according to the global color statistical information of the image, and then simply binarizes according to the segmentation threshold value. The method only has good effect in the image with simple background and single color, and has poor effect on the image with complex texture information or low contrast.

The local self-adaptive method mostly calculates the binarization threshold value according to local texture information, can avoid misjudgment of the global threshold value to a certain extent, but often causes the problems that adjacent local binarization effects are different and adjacent area binarization effects are not consistent due to the fact that the local information is over emphasized and the global overall information is ignored.

Therefore, the existing binarization method can only process fixed scenes, and the self-adaptive capability is not strong. In order to optimize the binarization effect of the image and improve the accuracy of extracting the characters in the image to be processed in step 101, in step 102, multiple binarization methods can be enumerated, and the optimal binarization method can be selected according to the confidence of the characters calculated subsequently, so that the effects of collecting the advantages of the various binarization methods and making up for the advantages are achieved, the binarization scene of the image is expanded, and the optimal binarization effect is obtained.

And 103, obtaining a processing result set according to the processing result.

In step 102, a plurality of binarization methods are enumerated, and each binarization method obtains a processing result, thereby forming a processing result set.

And 104, calculating the character confidence of each processing result in the processing result set.

The character confidence coefficient is used for representing the probability that characters in the processing result can be accurately identified, and the character confidence coefficient can be used as an evaluation index of the processing effect of the binarization method. If the confidence of the characters is high, the binarization processing effect is ideal, and if the confidence of the characters is low, the binarization processing effect is not ideal.

And 105, selecting a processing result with the highest character confidence as a binarization result of the picture to be processed.

And if a plurality of processing results with the highest character confidence degrees exist, selecting one of the processing results as a binarization result of the picture to be processed according to a preset selection method. The preset selection method may be random selection or other selection methods.

According to the embodiment of the invention, through enumerating various binarization methods and dynamically selecting an optimal processing result, binarization processing can be carried out on the pictures of all scenes, and a better binarization effect is obtained, so that the compatibility of picture processing is improved; the character confidence is used as an evaluation standard of the image binarization effect, so that the selected processing result can obtain the best character recognition result, and other character processing is favorably carried out on the result in the later period.

Further, please refer to fig. 2, which shows a flowchart of the text confidence obtaining method in step 104, including:

step 1041, obtain confidence of each word in the processing result.

Specifically, the confidence level output by the learning engine is obtained by inputting the processing result into a preset learning engine based on Optical Character Recognition (OCR). The learning engine can be a deep learning engine realized based on a Convolutional Neural Network (CNN), and the deep learning engine realized based on the Convolutional Neural Network (CNN) can better recognize single character pictures, has the characteristics of high accuracy, accurate confidence degree and the like, and is superior to a common traditional recognition engine. In addition, one of the advantages of the deep learning engine implemented based on the Convolutional Neural Network (CNN) compared with the conventional image processing algorithm is that a complex pre-processing process (extraction of artificial features and the like) on the image is avoided, and the original image can be directly input. For example, a 28 × 28 resolution single-word picture may be directly input, and the confidence may be directly output. Compared with the traditional method, the output result of the confidence coefficient is more reliable.

In addition, the conventional Tesseract learning engine and Nhocr engine also support confidence level recognition, and may be used in the present embodiment as well.

In a deep learning engine implemented based on a Convolutional Neural Network (CNN), a conventional Tesseract learning engine and an Nhocr engine, confidence degrees of output single words are all decimals between 0 and 1.

And 1042, calculating the character confidence of each processing result according to a preset character confidence algorithm and the confidence of each character.

Specifically, the preset text confidence algorithm includes, but is not limited to, a weighted average algorithm with a weighted average of confidence degrees as text confidence degrees, a geometric average algorithm with a geometric average of confidence degrees as text confidence degrees, a squared average algorithm with a squared average of confidence degrees as text confidence degrees, and a harmonic average algorithm with a harmonic average of confidence degrees as text confidence degrees. Taking the weighted average algorithm as an example, please refer to fig. 3, which shows a flowchart of the weighted average algorithm, including:

s1, setting a weight corresponding to each character.

Taking N characters as an example, the weight values of the characters can be Q respectively according to the sequence of the characters in the picture ₀ ……Q _n-1 . The weight value corresponding to each character can be randomly set by a programAnd moreover, targeted setting can be performed according to actual needs.

And S2, carrying out weighted summation on the confidence coefficient according to the confidence coefficient of each character and the weight value corresponding to the character.

According to the sequence of the characters in the picture, the confidence of each character can be Z ₀ ……Z _n-1 The process of weighted summation can be expressed as

And S3, dividing the result of the weighted summation by the number of characters in the processing result to obtain a weighted average confidence coefficient.

Weighted average confidence of

And S4, taking the weighted average confidence coefficient as a character confidence coefficient.

With weighted average confidence

As a confidence of characters applied in step S105

In the embodiment of the invention, different character confidence algorithms are selected and different parameters are set in the selected character confidence algorithm, so that the reliability and the distinguishing capability of the character confidence can be improved, the distinguishing of a plurality of processing results is supported, and the optimal processing result is distinguished from the plurality of processing results.

Further, in step 102, an existing binarization method may be used, or a custom binarization method may also be used, taking three existing common binarization methods as examples:

the method comprises the following steps: direct binarization

After the image is grayed, each pixel value of the scanned image is set to 0 (black) for the pixels with the grayscale value less than 127 and 255 (white) for the pixels with the grayscale value greater than or equal to 127, and the method has the advantages of small calculation amount and high speed. The disadvantage is that the pixel distribution and pixel value characteristics of the image are not considered.

The second method comprises the following steps: binarization based on mean value K

After graying the image, calculating an average value K of pixels in the image; the gray scale value of each pixel of the scanned image is set to 255 (white) if the gray scale value is larger than K, and is set to 0 (black) if the gray scale value is smaller than or equal to K. This method uses the average value as the binarization threshold value, although simple, may cause a part of the object pixels or the background pixels to be lost. And the binarization result is difficult to truly reflect the source image information.

The third method comprises the following steps: method of variance between classes of maxima

Assuming that an image is composed of a foreground region and a background region, gray level histograms of the foreground region and the background region in segmentation results under different threshold values (generally within an interval range of [0, 255 ]) are calculated through traversal, and then the variance between the gray level histograms is compared, so that the gray level threshold value with the maximized variance is the solved binarization threshold value.

Scanning each pixel value of the image, if the gray value is larger than the binarization threshold value, setting the gray value of the pixel to be 255 (white), and if the gray value is smaller than or equal to the binarization threshold value, setting the gray value of the pixel to be 0 (black).

The maximum inter-class variance method is a classical binarization method, and obtains a good balance effect between the calculation speed and the binarization effect, but as a global binarization method, the method has a poor effect on images with complex texture information or low contrast.

The above-described common binarization method or other binarization methods listed in the embodiments of the present invention may be applied to step S102. In order to obtain a better binarization effect, in step S102, the embodiment of the present invention performs binarization processing on the picture to be processed by using a binarization method based on a sliding window and a binarization method based on color value statistics.

Please refer to fig. 4, which shows a flowchart of a binarization method based on a sliding window, comprising:

and T1, setting a window at a preset position of the picture to be processed.

The size and shape of the window can be set according to actual needs, and the window is set at a preset position by taking the window containing M x N pixels as an example. The preset position can also be set according to actual needs, specifically, the preset position can be located at the upper left corner or the lower right corner of the picture to be processed, and for the picture to be processed with the width of M pixels, the preset position can be located at the leftmost end or the rightmost end of the picture to be processed.

And T2, judging whether the pixels in the window and the related pixels belong to continuous patterns.

The related pixels are pixels outside the window and adjacent to the window. The purpose of step T2 is to determine whether M × N pixels and the relevant pixels in the to-be-processed picture that fall within the window belong to a continuous pattern, and if not, determine that the window contains text.

And T3, if not, carrying out local binarization on the pixels in the window.

If the window contains characters, binarization processing is performed on pixels in the window. In this embodiment, only the binarization effect of the text portion in the picture to be processed is concerned, and therefore, if the window does not contain text, the window is not processed, or the window is directly set to a uniform gray level.

And T4, judging whether the window reaches the end point of the preset track.

And the window slides according to a preset track, and the sliding is finished when the window reaches the end of the preset track.

And T5, if not, sliding the window according to a preset track.

The preset track can be set according to actual needs. For a picture to be processed with a width of M pixels, the window may be moved along its length.

And T6, returning to the step T2.

Specifically, the local binarization method used in step T3 may be implemented by using an existing binarization method, in this embodiment, please refer to fig. 5, which shows a flowchart of the local binarization method in step T3 in this embodiment, and includes:

and T31, obtaining the color distribution statistical result of the pixels in the window.

And T32, setting a threshold value according to the statistical result, wherein the threshold value is used for distinguishing the foreground and the background of the picture to be processed.

And (3) assuming that the picture is composed of a foreground area and a background area, and distinguishing foreground pixels and background pixels of the image by selecting the threshold value. For example, classifying pixels with colors larger than the threshold value as foreground pixels, and conversely classifying pixels as background pixels; or classifying pixels with colors smaller than the threshold value as foreground pixels, and vice versa as background pixels. The threshold can be such that the color mean of the foreground pixels and the color mean of the background pixels have the greatest difference after segmentation based on the threshold.

And T33, carrying out binarization on the pixels in the window according to the threshold value.

Specifically, the foreground pixels may be set to 255 (white), the background pixels to 0 (black); the foreground pixels may be set to 0 (black) and the background pixels to 255 (white).

The binarization method based on the sliding window belongs to a local self-adaptive method, is more suitable for segmenting scenes of text lines in a picture, and can obtain better binarization effect under the scenes of single color information and uncomplicated background textures.

Please refer to fig. 6, which shows a flowchart of a binarization method based on color value statistics, comprising:

and P1, obtaining a color distribution statistical result of the pixels of the picture to be processed.

And P2, based on the color distribution statistical result, obtaining two target colors by using a preset color clustering algorithm.

Clustering is an aggregation of data, which is the clustering of similar data into one class. Clustering is an unsupervised classification approach, which has the advantage that no prior training process is required. In general, the range of a color space can be narrowed down by a color clustering algorithm, and the distance between colors can be increased, so as to obtain a color clustering result (target color), and currently, methods commonly used for color clustering include K-means, gaussian Mixture Models (GMM), mean shift, and the like.

And P3, setting a foreground color and a background color according to the two target colors.

And P4, sequentially calculating the first distance and the second distance of the pixels of the picture to be processed, and judging the attribution of the pixels according to the calculation result.

Specifically, the first distance is a euclidean distance between the color of the pixel and the foreground color, and the second distance is a euclidean distance between the color of the pixel and the background color. If the first distance is smaller than the second distance, the pixel is judged to belong to the foreground; and if the first distance is greater than the second distance, judging that the pixel belongs to the background.

And P5. binarizes the pixels in the picture to be processed according to the judgment result.

The picture is assumed to be composed of a foreground area and a background area, and the pixels in the picture are judged to belong to foreground pixels or background pixels by calculating a first distance and a second distance of the pixels. Specifically, the foreground pixels may be set to 255 (white), the background pixels to 0 (black); the foreground pixels may be set to 0 (black) and the background pixels to 255 (white).

The binarization method based on color value statistics provided by the embodiment of the invention belongs to a global method, and can be suitable for complex scenes due to the fact that the target color is calculated by a clustering method, and the application range is wide.

Further, in step S104, the present invention evaluates the processing result of the binarization processing performed on the picture to be processed by using the sliding window-based binarization method and the color value statistics-based binarization method through a deep learning engine implemented based on a Convolutional Neural Network (CNN), so as to obtain a text confidence of the processing result of the binarization method based on the sliding window and a text confidence of the processing result of the binarization method based on the color value statistics, respectively.

A Convolutional Neural Network (CNN) is one of Network structures that are very representative in deep learning technology, and has achieved great success in the field of image processing, and on an international standard ImageNet data set, many successful models are based on CNN, and the deep learning engine used in this embodiment is also based on the Convolutional Neural Network CNN. One of the advantages of CNN over conventional image processing algorithms is that it avoids complex pre-processing procedures (extraction of artificial features, etc.) on the image, and can directly input the original image and output the confidence for a single character.

In image processing, an image is generally regarded as one or more two-dimensional vectors, and for a picture after graying, the picture can be regarded as one two-dimensional vector, and the gray value of a pixel is an element in the two-dimensional vector; for a color picture represented by RGB (RGB color scheme is a color standard in the industry), there are three color channels, which can be represented as three two-dimensional vectors. The conventional Neural Network adopts a full-connection mode, that is, neurons from the input layer to the hidden layer are all connected, so that the parameter quantity is huge, and the Network training consumes time and is even difficult to train.

In the convolutional neural network CNN in this embodiment, there are mainly two types of network layers, which are a convolutional layer and a pooling/sampling layer. The convolution layer is used for extracting various characteristics of the image; the pooling layer is used for abstracting the original characteristic signals, so that training parameters are greatly reduced, and the overfitting degree of the model can be reduced.

The convolution layer is obtained by calculating convolution kernels on an upper-level input layer through sliding windows one by one, each parameter in the convolution kernels is equivalent to a weight parameter in a traditional neural network and is connected with a corresponding local pixel, and the sum of the multiplication of each parameter of the convolution kernels and the corresponding local pixel value (usually, a bias parameter is added) is obtained to obtain a result on the convolution layer.

After the features of the image are obtained by the convolutional layer, in order to further reduce the degree of overfitting of the network training parameters and the model of the deep learning engine implemented based on the convolutional neural network in this embodiment, the convolutional layer is subjected to pooling/sampling processing. The pooling/sampling approach is typically two of the following:

Max-Pooling, selecting the maximum value in a Pooling window as a sampling value;

Mean-Pooling, averaging all values in the Pooling window, and taking the average value as a sampling value.

Please refer to fig. 7, which shows a structure diagram of the convolutional neural network in the present embodiment, and the classical convolutional neural network structure is used in the present embodiment.

The C1 layer is a convolution layer, 6 characteristic graphs are obtained in the C1 layer, each neuron in each characteristic graph is connected with a 5*5 neighborhood in input, and the size of each characteristic graph is 28 × 28; each convolution neuron has 25 unit parameters and a basic parameter; there are 122304 connections.

The S2 layer is a down-sampling layer, 6 characteristic maps of 14 × 14 are provided, each unit in each map is connected with a neighborhood of 2*2 in the C1 characteristic map and is not overlapped, and therefore, the size of each characteristic map in the S2 layer is 1/4 of the size of the characteristic map in the C1 layer; and 4 inputs of each unit of the S2 layer are added, multiplied by a trainable parameter W and added with a trainable offset b, and the result is calculated by a sigmoid function. The number of connections of the S2 layer is 5880.

The C3 layer is a convolution layer and has 16 convolution kernels, 16 characteristic graphs are obtained, and the size of the characteristic graphs is 10 x 10; each neuron in each profile is connected to several 5*5 neighborhoods at some level in S2.

S4 is a down-sampling layer, which is composed of 16 feature maps of 5*5 size, and each unit in the feature maps is connected with 2*2 neighborhood of the corresponding feature map in C3; the number of connections is 2000.

The C5 layer is a convolutional layer and comprises 120 neurons and 120 characteristic maps, wherein the size of each characteristic map is 1*1; each cell is connected to the 5*5 neighborhood of all 16 cells of the S4 layer for a total of 48120 connections.

The F6 layer has 84 units, is fully connected with the C5 layer and has 10164 connection numbers.

After the picture to be processed is processed by using the binarization method, the processing result needs to be analyzed. When the processing result is analyzed, in the prior art, it is generally difficult to obtain a result with high accuracy for binarization analysis in a scene with low contrast or complex texture. The deep learning engine implemented based on the convolutional neural network provided by the embodiment is a deep learning neural network based on big data, the accuracy of the output result of the confidence coefficient is high, the output speed is high, and the defects of high requirement on a scene and poor accuracy in the prior art are overcome. Therefore, the processing result in step 103 is evaluated based on the engine, and the robustness and the accuracy are higher than those of the traditional binarization processing result evaluation method.

Depending on the engine, the embodiment of the present invention may adaptively calculate the character confidence of the binarization method processing result based on the sliding window and the character confidence of the binarization method processing result based on the color value statistics, and thus select the processing result in step S105.

In one scenario of an embodiment of the present invention, please refer to fig. 8, which shows a to-be-processed picture. Please refer to fig. 9, which illustrates a processing result of the sliding window based binarization method according to the embodiment of the present invention for the to-be-processed picture in fig. 8. Please refer to fig. 10, which illustrates a processing result of the binarization method based on color value statistics according to the embodiment of the present invention for the to-be-processed picture in fig. 8. Inputting fig. 9 and fig. 10 into a deep learning engine implemented based on a Convolutional Neural Network (CNN), obtaining a confidence of each character in fig. 9 and fig. 10, and further calculating a confidence of the character in fig. 9 and fig. 10, where in this embodiment, the confidence of the character in fig. 9 is 0.88, and the confidence of the character in fig. 10 is 0.97, and therefore, the processing result in fig. 10 is selected as a binarization result of the to-be-processed picture in fig. 8.

In another scenario of the embodiment of the present invention, please refer to fig. 11, which illustrates a picture to be processed. Please refer to fig. 12, which shows a processing result of the sliding window based binarization method according to the embodiment of the present invention for the to-be-processed picture in fig. 11. Please refer to fig. 13, which illustrates a processing result of the binarization method based on color value statistics according to the embodiment of the present invention for the to-be-processed picture in fig. 11. Inputting fig. 12 and fig. 13 into a deep learning engine implemented based on a Convolutional Neural Network (CNN), obtaining a confidence of each character in fig. 12 and fig. 13, and further calculating a confidence of the character in fig. 12 and fig. 13, where in this embodiment, the confidence of the character in fig. 12 is 0.99, and the confidence of the character in fig. 13 is 0.94, and therefore, the processing result in fig. 13 is selected as a binarization result of the to-be-processed picture in fig. 11.

In the embodiment of the invention, the image to be processed is respectively and independently processed by various binarization methods with strong complementarity, then the confidence coefficient of a single character is obtained by using a learning engine based on optical character recognition, and then the confidence coefficient of the character is calculated, so that the optimal processing result can be dynamically selected. Seamless switching of processing results of various binarization methods can be realized without concerning global information or local textures.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.

Referring to fig. 14, a block diagram of a binarization device for a picture is shown, wherein the binarization device has a function of implementing the above method, and the function can be implemented by hardware, and can also be implemented by hardware executing corresponding software. The apparatus may include:

a to-be-processed picture obtaining module 201, configured to obtain a to-be-processed picture. Which may be used to perform step 101.

A processing result obtaining module 202, configured to perform independent binarization processing on the to-be-processed picture by using a plurality of preset binarization processing methods, respectively, where each binarization method obtains a processing result. Which may be used to perform step 102.

A processing result set obtaining module 203, configured to obtain a processing result set according to the processing result. Which may be used to perform step 103.

A word confidence calculation module 204, configured to calculate a word confidence of each processing result in the processing result set. Which may be used to perform step 104.

A binarization result obtaining module 205, configured to select a processing result with the highest text confidence as a binarization result for the to-be-processed picture. Which may be used to perform step 105.

Further, the text confidence calculation module 204 includes:

the confidence obtaining unit 2041 is configured to obtain the confidence of each word in the processing result. Which may be used to perform step 1041.

The text confidence calculating unit 2042 is configured to calculate a text confidence of the processing result according to a preset text confidence algorithm and a confidence of each text. Which may be used to perform step 1042.

Referring to fig. 15, which shows a block diagram of a text confidence calculation unit, the text confidence calculation unit 2042 may include:

the weight setting module 20421 is configured to set a weight corresponding to each character. Which may be used to perform step S1.

The average confidence calculating module 20422 is configured to calculate a weighted average confidence of the processing result. Which may be used to perform steps S2 and S3.

A text confidence coefficient obtaining module 20423, configured to use the weighted average confidence coefficient as a text confidence coefficient. Which may be used to perform step S4.

Referring to fig. 16, a block diagram of a processing result obtaining module 202 is shown, which includes:

and the sliding window binarization unit 2021 is used for performing binarization processing on the picture to be processed based on a sliding window binarization method.

And the color value statistics and binarization unit 2022 is used for performing binarization processing on the picture to be processed by using a binarization method based on color value statistics.

Specifically, referring to fig. 17, which shows a block diagram of a sliding window binarization unit 2021 includes:

the window setting module 20211 is configured to set a window at a preset position of the to-be-processed picture. Which may be used to perform step T1.

A first determining module 20212, configured to determine whether the pixels in the window and the related pixels belong to a continuous pattern; the related pixels are pixels outside the window and adjacent to the window. Which may be used to perform step T2.

A local binarization module 20213, configured to perform local binarization on the pixels in the window. Which may be used to perform step T3.

A second determining module 20214, configured to determine whether the window is slid to reach an end point of the preset track. Which may be used to perform step T4.

A moving module 20215, configured to move the window according to a preset track. Which may be used to perform step T5.

Specifically, referring to fig. 18, which shows a block diagram of a color value statistical binarization unit 2022, the color value statistical binarization unit includes:

a statistical result obtaining module 20221, configured to obtain a color distribution statistical result of the pixels of the to-be-processed picture. Which can be used to perform step P1.

A target color obtaining module 20222, configured to obtain two target colors by using a preset color clustering algorithm based on the color distribution statistical result. Which can be used to perform step P2.

A setting module 20223, configured to set a foreground color and a background color according to the two target colors. Which may be used to perform step P3.

The determining module 20224 is configured to sequentially calculate a first distance and a second distance of a pixel of the to-be-processed picture, and determine an attribute of the pixel according to a calculation result; the first distance is a euclidean distance between the color of the pixel and the foreground color, and the second distance is a euclidean distance between the color of the pixel and the background color. Which may be used to perform step P4.

A binarization module 20225, configured to binarize, according to the determination result, the pixel in the to-be-processed picture. Which may be used to perform step P5.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Referring to fig. 19, a schematic structural diagram of a terminal according to an embodiment of the present invention is shown. The terminal is used for implementing the binarization method of the picture provided in the embodiment.

The terminal may include components such as an RF (Radio Frequency) circuit 110, a memory 120 including one or more computer-readable storage media, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, a WiFi (wireless fidelity) module 170, a processor 180 including one or more processing cores, and a power supply 190. Those skilled in the art will appreciate that the terminal structure shown in fig. 19 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the RF circuit 110 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information from a base station and then sends the received downlink information to the one or more processors 180 for processing; in addition, data relating to uplink is transmitted to the base station. In general, the RF circuitry 110 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, and the like. In addition, the RF circuitry 110 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for mobile communications), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), email, SMS (Short Messaging Service), and the like.

The memory 120 may be used to store software programs and modules, and the processor 180 executes various functional applications and data processing by operating the software programs and modules stored in the memory 120. The memory 120 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 120 may further include a memory controller to provide the processor 180 and the input unit 130 with access to the memory 120.

The input unit 130 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 130 may include a touch-sensitive surface 131 as well as other input devices 132. The touch-sensitive surface 131, also referred to as a touch display screen or a touch pad, may collect touch operations by a user on or near the touch-sensitive surface 131 (e.g., operations by a user on or near the touch-sensitive surface 131 using a finger, a stylus, or any other suitable object or attachment), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface 131 may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 180, and can receive and execute commands sent by the processor 180. Additionally, the touch sensitive surface 131 may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. In addition to touch-sensitive surface 131, input unit 130 may include other input devices 132. In particular, other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 140 may be used to display information input by or provided to a user and various graphic user interfaces of the terminal, which may be configured by graphics, text, icons, video, and any combination thereof. The Display unit 140 may include a Display panel 141, and optionally, the Display panel 141 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface 131 may cover the display panel 141, and when a touch operation is detected on or near the touch-sensitive surface 131, the touch operation is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 provides a corresponding visual output on the display panel 141 according to the type of the touch event. Although in FIG. 19, touch-sensitive surface 131 and display panel 141 are shown as two separate components to implement input and output functions, in some embodiments, touch-sensitive surface 131 may be integrated with display panel 141 to implement input and output functions.

The terminal may also include at least one sensor 150, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 141 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 141 and/or a backlight when the terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the terminal is stationary, and can be used for applications of recognizing terminal gestures (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured in the terminal, detailed description is omitted here.

Audio circuitry 160, speaker 161, microphone 162 may provide an audio interface between a user and the terminal. The audio circuit 160 may transmit the electrical signal converted from the received audio data to the speaker 161, and convert the electrical signal into a sound signal for output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electrical signal, which is received by the audio circuit 160 and converted into audio data, and then the audio data is processed by the audio data output processor 180, and then sent to another terminal via the RF circuit 110, or the audio data is output to the memory 120 for further processing. The audio circuitry 160 may also include an earbud jack to provide communication of peripheral headphones with the terminal.

WiFi belongs to a short-distance wireless transmission technology, and the terminal can help a user to send and receive e-mails, browse webpages, access streaming media and the like through the WiFi module 170, and provides wireless broadband internet access for the user. Although fig. 19 shows the WiFi module 170, it is understood that it does not belong to the essential constitution of the terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 180 is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, performs various functions of the terminal and processes data by operating or executing software programs and/or modules stored in the memory 120 and calling data stored in the memory 120, thereby integrally monitoring the terminal. Optionally, processor 180 may include one or more processing cores; preferably, the processor 180 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 180.

The terminal also includes a power supply 190 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 180 via a power management system to manage charging, discharging, and power consumption management functions via the power management system. The power supply 190 may also include any component including one or more of a dc or ac power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, the terminal may further include a camera, a bluetooth module, and the like, which are not described herein again. Specifically, in this embodiment, the display unit of the terminal is a touch screen display, the terminal further includes a memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors, where the one or more programs include instructions for executing the binarization method of the picture.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of a terminal to perform the steps of the above method embodiments is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It should be understood that reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A binarization method of a picture is characterized by comprising the following steps:

respectively carrying out independent binarization processing on the picture to be processed by using a plurality of preset binarization processing methods, wherein each binarization method obtains a processing result;

obtaining a processing result set according to the processing result;

selecting a processing result with the highest character confidence coefficient as a binarization result of the picture to be processed;

the preset binarization processing method comprises a binarization method based on a sliding window, and the binarization method based on the sliding window comprises the following steps: setting a window at a preset position of the picture to be processed; judging whether the pixels in the window and the related pixels belong to continuous patterns or not; the related pixels are pixels outside the window and adjacent to the window; if not, carrying out local binarization on the pixels in the window; judging whether the window reaches the end point of the preset track or not; if not, sliding the window according to a preset track; and returning to the step of judging whether the pixels in the window and the adjacent pixels outside the window belong to continuous patterns or not.

2. The method of claim 1, wherein said calculating a textual confidence for each of the set of processing results comprises:

obtaining the confidence of each character in the processing result;

and calculating the character confidence of the processing result according to a preset character confidence algorithm and the confidence of each character.

3. The method of claim 2, wherein obtaining the confidence level for each word in the processing result comprises:

inputting the processing result into a preset learning engine based on optical character recognition;

and obtaining the confidence degree output by the learning engine.

4. The method of claim 2, wherein the calculating the text confidence of the processing result according to a preset text confidence algorithm and the confidence of each text comprises:

setting a weight corresponding to each character in a processing result;

calculating a weighted average confidence for the processing result: carrying out weighted summation on the confidence coefficient according to the confidence coefficient of each character and the weight value corresponding to the character; dividing the result of weighted summation by the number of characters in the processing result to obtain weighted average confidence;

and taking the weighted average confidence as the character confidence.

5. The method according to claim 1, wherein the preset binarization processing method further comprises a binarization method based on color value statistics.

6. The method according to claim 1, wherein the local binarization comprises:

obtaining a color distribution statistical result of the pixels in the window;

setting a threshold value according to the statistical result, wherein the threshold value is used for distinguishing the foreground and the background of the picture to be processed;

and carrying out binarization on the pixels in the window according to the threshold value.

7. The method of claim 5, wherein the binarization method based on color value statistics comprises:

obtaining a color distribution statistical result of the pixels of the picture to be processed;

based on the color distribution statistical result, obtaining two target colors by using a preset color clustering algorithm;

setting a foreground color and a background color according to the two target colors;

sequentially calculating a first distance and a second distance of pixels of the picture to be processed, and judging the attribution of the pixels according to a calculation result; the first distance is a euclidean distance between the color of the pixel and the foreground color, and the second distance is a euclidean distance between the color of the pixel and the background color;

and carrying out binarization on the pixels in the picture to be processed according to the judgment result.

8. The method according to claim 7, wherein the sequentially calculating a first distance and a second distance of a pixel of the picture to be processed, and determining the attribution of the pixel according to the calculation result comprises:

if the first distance is smaller than the second distance, the pixel is judged to belong to the foreground;

and if the first distance is greater than the second distance, judging that the pixel belongs to the background.

9. An apparatus for binarizing a picture, the apparatus comprising:

a binarization result obtaining module for selecting the processing result with the highest character confidence as the binarization result of the picture to be processed;

the processing result obtaining module comprises: the sliding window binarization unit is used for carrying out binarization processing on the picture to be processed based on a binarization method of a sliding window; the sliding window binarization unit includes: the window setting module is used for setting a window at a preset position of the picture to be processed; the first judging module is used for judging whether the pixels in the window and the related pixels belong to continuous patterns or not; the related pixels are pixels outside the window and adjacent to the window; the local binarization module is used for carrying out local binarization on the pixels in the window; the second judgment module is used for judging whether the sliding window reaches the end point of the preset track; and the moving module is used for moving the window according to a preset track.

10. The apparatus of claim 9, wherein the text confidence calculation module comprises:

the confidence coefficient acquisition unit is used for acquiring the confidence coefficient of each character in the processing result;

and the character confidence coefficient calculating unit is used for calculating the character confidence coefficient of the processing result according to a preset character confidence coefficient algorithm and the confidence coefficient of each character.

11. The apparatus of claim 9, wherein the text confidence calculation unit comprises:

the weight setting module is used for setting the weight corresponding to each character;

the average confidence coefficient calculation module is used for calculating the weighted average confidence coefficient of the processing result;

and the character confidence coefficient obtaining module is used for taking the weighted average confidence coefficient as a character confidence coefficient.

12. The apparatus of claim 9, wherein the processing result obtaining module further comprises:

and the color value statistics and binarization unit is used for carrying out binarization processing on the picture to be processed based on the binarization method of color value statistics.

13. The apparatus of claim 12, wherein the statistical color value binarization unit comprises:

a statistical result obtaining module, configured to obtain a color distribution statistical result of pixels of the picture to be processed;

the target color obtaining module is used for obtaining two target colors by using a preset color clustering algorithm based on the color distribution statistical result;

the setting module is used for setting a foreground color and a background color according to the two target colors;

the judging module is used for calculating a first distance and a second distance of pixels of the picture to be processed in sequence and judging the attribution of the pixels according to a calculation result; the first distance is a euclidean distance between the color of the pixel and the foreground color, and the second distance is a euclidean distance between the color of the pixel and the background color;

and the binarization module is used for binarizing the pixels in the picture to be processed according to the judgment result.

14. A picture binarization terminal characterized by comprising a picture binarization device of any one of claims 9-13.

15. A non-transitory computer readable storage medium comprising instructions executed by a processor of a terminal to perform the binarization method of the picture of any one of claims 1-8.