Summary of the invention
The embodiment of the invention provides a kind of method and apparatus for extracting pictograph region, at least to solve the prior art
The technical issues of identification of middle character area needs complicated modeling process, leads to identification process low efficiency.
According to an aspect of an embodiment of the present invention, a kind of method for extracting pictograph region is provided, comprising: will be to
It handles image and carries out piecemeal, obtain multiple macro blocks;Greyscale color transformation is carried out to the color value of pixel each in macro block, is obtained every
The gray value of a pixel;The text pixel in macro block is searched according to the gray value of each pixel;Text pixel is extracted, and to text
Pixel is spliced, and the character area of image to be processed is obtained.
Further, the color value of each pixel is obtained, wherein color value includes the color data of three Color Channels;
The mean value for determining the color data of three Color Channels is gray value.
Further, determine that the frequency of occurrences is most in macro block gray value is the according to the gray value of pixel each in macro block
One domain color;It is text pixel that determining, which has the first kind pixel of the first domain color,.
Further, the ratio that the quantity of first kind pixel accounts for pixel quantity in macro block is compared with preset ratio;
In the case where the ratio that the quantity of first kind pixel accounts for pixel quantity in macro block is greater than or equal to preset ratio, stopping continues to look into
Look for text pixel;In the case where the ratio that the quantity of first kind pixel accounts for pixel quantity in macro block is less than preset ratio, macro
Text pixel is continued to search in block.
Further, the most gray value of the frequency of occurrences is searched in the macro block for rejecting first kind pixel, and determination is being picked
Except the most gray value of the frequency of occurrences in the macro block of first kind pixel is the second domain color;Determining has the second of the second domain color
Class pixel is text pixel;The sum of the quantity of the quantity of first kind pixel and the second class pixel is accounted for the ratio of pixel quantity in macro block
Example is compared with preset ratio;Pixel quantity in macro block is accounted in the quantity of first kind pixel and the sum of the quantity of the second class pixel
Ratio be greater than or equal to preset ratio in the case where, stopping continue to search text pixel;In the quantity of first kind pixel and
In the case that the ratio that the sum of the quantity of two class pixels accounts for pixel quantity in macro block is less than preset ratio, continue to search in a macroblock
Text pixel.
Further, when finding n-th of domain color in a macroblock, if the sum of the quantity of n domain color accounts for macro block
The ratio of middle pixel quantity is still less than preset ratio, and n is equal to domain color amount threshold, it is determined that text picture is not present in macro block
Element, wherein domain color amount threshold is the domain color quantity in preset macro block.
Further, gray value is searched in a macroblock and meet the pixel of preset condition, wherein preset condition includes: and m
The difference of the gray value of a domain color is within a preset range, wherein m >=1;The pixel for determining that gray value meets preset condition is text
Pixel.
Further, the color frequency distribution map of macro block is determined according to the gray value of pixel each in macro block;According to color
Frequency distribution searches the gray value that the frequency of occurrences is most in macro block, and determines that the gray value found is the first domain color.
Further, the text pixel in each macro block is marked;The text picture that will be marked in each macro block
Element is spliced, and the corresponding character area of each macro block is obtained;The corresponding character area of each macro block is spliced, obtain to
Handle the character area of image.
According to another aspect of an embodiment of the present invention, a kind of device for extracting pictograph region is additionally provided, comprising: point
Block module obtains multiple macro blocks for image to be processed to be carried out piecemeal;Conversion module, for pixel each in macro block
Color value carries out greyscale color transformation, obtains the gray value of each pixel;Searching module, for the gray value according to each pixel
Search the text pixel in macro block;Abstraction module splices for extracting text pixel, and to text pixel, obtains wait locate
Manage the character area of image.
According to another aspect of an embodiment of the present invention, a kind of storage medium is additionally provided, storage medium includes the journey of storage
Sequence, wherein the method that equipment where control storage medium executes above-mentioned extraction pictograph region in program operation.
According to another aspect of an embodiment of the present invention, a kind of processor is additionally provided, processor is used to run program,
In, program executes the above-mentioned method for extracting pictograph region when running.
In embodiments of the present invention, image to be processed is subjected to piecemeal, multiple macro blocks is obtained, to pixel each in macro block
Color value carries out greyscale color transformation, obtains the gray value of each pixel, is searched in macro block according to the gray value of each pixel
Text pixel extracts text pixel, and splices to text pixel, obtains the character block of image to be processed.Above scheme will
Image to be processed obtains multiple macro blocks after carrying out piecemeal, obtains the text pixel in each macro block respectively, then will be in all macro blocks
Text pixel spliced the molecule block for obtaining image to be processed, so as to be divided from the topography of image to be processed
Analysis avoids complicated modeling process, and the identification for solving character area in the prior art needs complicated modeling process, causes
The technical issues of identification process low efficiency, to improve the treatment effeciency for extracting character block.
Embodiment 1
According to embodiments of the present invention, a kind of embodiment for extracting the method for character block in image is provided, needs to illustrate
It is that step shown in the flowchart of the accompanying drawings can execute in a computer system such as a set of computer executable instructions,
Also, although logical order is shown in flow charts, and it in some cases, can be to be different from sequence execution herein
Shown or described step.
Fig. 1 is the flow chart for extracting the method for character block in image according to the embodiment of the present application, as shown in Figure 1, the party
Method includes the following steps:
Image to be processed is carried out piecemeal, obtains multiple macro blocks by step S102.
Specifically, above-mentioned image to be processed is the image to therefrom extract character area, it is by image segmentation to be processed
Multiple macro blocks then can individually analyze each macro block, to analyze the part of image to be processed.
Step S104 carries out greyscale color transformation to the color value of pixel each in macro block, obtains the gray scale of each pixel
Value.
Specifically, the color value to each pixel carries out greyscale color transformation, value determination according to the color of pixels can be
The gray value of pixel.
In above-mentioned steps, carrying out greyscale color transformation can be Y points be transformed to the rgb value of pixel in YUV component
Magnitude, i.e. gray value.After image is transformed to gray level image, the Y-component of text and background still has biggish difference,
Text and background can be distinguished by Y-component, therefore the application extracts the character area of image to be processed by Y-component.
Step S106 searches the text pixel in macro block according to the gray value of each pixel.
Above-mentioned text pixel is to constitute the pixel of the character area of image to be processed, due in a macroblock, background color
Value is usually more dispersed, and the color of text compares concentration, therefore the probability value that can occur according to gray value in pixel, comes true
Determine the text pixel in macro block.
Step S108 extracts text pixel, and splices to text pixel, obtains the character block of image to be processed.
Specifically, the character area of above-mentioned image to be processed be used for characterize remove text other than background image after to
Handle image.
In an alternative embodiment, then the macro block to be split for being divided into 8*8,16*16 or 32*32 can be obtained
The gray value of each pixel in macro block is taken, and determines the text pixel in macro block according to the gray value of each pixel.It again will be wait locate
Each macro block in reason image repeats the above steps to obtain the text pixel in each macro block, will finally process all macro blocks
The text pixel obtained afterwards is spliced, to obtain the character area of image to be processed.
From the foregoing, it will be observed that image to be processed is carried out piecemeal by the above embodiments of the present application, multiple macro blocks are obtained, to every in macro block
The color value of a pixel carries out greyscale color transformation, obtains the gray value of each pixel, is searched according to the gray value of each pixel
Text pixel in macro block extracts text pixel, and splices to text pixel, obtains the character block of image to be processed.On
State scheme will image to be processed carry out piecemeal after obtain multiple macro blocks, obtain the text pixel in each macro block respectively, then by institute
There is the text pixel in macro block to be spliced to obtain the sub-block of image to be processed, so as to the topography to image to be processed
It is analyzed, avoids complicated modeling process, the identification for solving character area in the prior art needs complicated modeling
Journey, the technical issues of leading to identification process low efficiency, to improve the treatment effeciency for extracting character block.
Optionally, according to the above embodiments of the present application, greyscale color variation is carried out to the color value of pixel each in macro block,
Obtain the gray value of each pixel, comprising:
Step S1041 obtains the color value of each pixel, wherein color value includes the color data of three Color Channels.
In an alternative embodiment, the color value of pixel is (R, G, B), and R, G, B are respectively three Color Channels, should
The corresponding data of three Color Channels constitute the color value of pixel.
Step S1043 determines that the mean value of the color data of three Color Channels is gray value.
In an alternative embodiment, the gray value of pixel can be calculated by following formula:
Wherein, Y is used to characterize the gray value of pixel, and R, G, B are respectively used to indicate pixel in R, G, B
Color data on these three Color Channels.
Optionally, it according to the above embodiments of the present application, is determined according to the gray value of each pixel, searches the text in macro block
Pixel, comprising:
Step S1061 is according to the gray value that the gray value of pixel each in macro block determines that the frequency of occurrences is most in macro block
First domain color.
Step S1063, it is text pixel that determining, which has the first kind pixel of the first domain color,.Determining that the first domain color is
After text pixel, text pixel can also be marked.
Optionally, according to the above embodiments of the present application, after determining that the pixel with the first domain color is text pixel,
The above method further include:
The ratio that the quantity of first kind pixel accounts for pixel quantity in macro block is compared step S1067 with preset ratio.
Step S1069 is greater than or equal to preset ratio in the ratio that the quantity of first kind pixel accounts for pixel quantity in macro block
In the case where, stopping continues to search text pixel.
Specifically, text pixel can occupy one of all pixels in macro block if including character area in a macro block
Certainty ratio is recognized in the case where the ratio that the quantity of first kind pixel accounts for pixel quantity in macro block is greater than or equal to preset ratio
To have found text pixel all in macro block, therefore stop continuing to search.
Step S1071, the case where the ratio that the quantity of first kind pixel accounts for pixel quantity in macro block is less than preset ratio
Under, text pixel is continued to search in a macroblock.
Specifically, text pixel can occupy one of all pixels in macro block if including character area in a macro block
Certainty ratio, due to not can determine that the certain all same colors of the text in image to be processed, based on including in macro block
Text pixel this it is assumed that first kind pixel quantity account for pixel quantity in macro block ratio be less than preset ratio the case where
Under, it is believed that further include other text pixels in macro block, therefore continues to search.
Optionally, according to the above embodiments of the present application, the ratio of pixel quantity in macro block is accounted in the quantity of first kind pixel
In the case where less than preset ratio, text pixel is continued to search in a macroblock, comprising:
Step S1073 searches the most gray value of the frequency of occurrences in the macro block for rejecting first kind pixel, and determination is being picked
Except the most gray value of the frequency of occurrences in the macro block of first kind pixel is the second domain color.
Step S1075, it is text pixel that determining, which has the second class pixel of the second domain color,.
The sum of the quantity of the quantity of first kind pixel and the second class pixel is accounted for pixel quantity in macro block by step S1077
Ratio is compared with preset ratio.
Step S1079 accounts for pixel quantity in macro block in the quantity of first kind pixel and the sum of the quantity of the second class pixel
In the case that ratio is greater than or equal to preset ratio, stopping continues to search text pixel.
Step S10711 accounts for pixel quantity in macro block in the quantity of first kind pixel and the sum of the quantity of the second class pixel
In the case that ratio is less than preset ratio, text pixel is continued to search in a macroblock.
Optionally, according to the above embodiments of the present application, when finding n-th of domain color in a macroblock, if n main face
The sum of quantity of color accounts for the ratio of pixel quantity in macro block still less than preset ratio, and n is equal to domain color amount threshold, it is determined that
Text pixel is not present in macro block, wherein domain color amount threshold is the domain color quantity in preset macro block.
In an alternative embodiment, enabling domain color threshold value is 4, in the quantity and the second class pixel of first kind pixel
The sum of quantity accounts for all pixels quantity in macro block and is rejecting first kind pixel and the second class pixel less than in the case where preset ratio
Macro block in search the most color group of the frequency of occurrences, and determine and go out in the macro block for rejecting first kind pixel and the second class pixel
The most color group of existing frequency is third domain color;Quantity, the quantity and third picture of the second class pixel in first kind pixel
The sum of the quantity of element accounts for all pixels quantity in macro block and continues acquisition the 4th according to above scheme less than in the case where preset ratio
Pixel, if the quantity of first kind pixel, the quantity of the second class pixel, the quantity of the quantity of third pixel and the 4th pixel it
It is less than preset ratio with all pixels quantity in macro block is accounted for, it is determined that character block is not present in the macro block.
Due to obtaining domain color every time and determining that the corresponding pixel of domain color is text pixel, it is all based on the macro block packet
The progress of this hypothesis of text pixel is included, if after finding n-th of domain color, the quantity of all domain colors accounts for picture in macro block
The ratio of prime number amount is still less than preset ratio, then overthrowing this, there is no text pixels it is assumed that determining the macro block, and before cancellation
Determining text pixel.
Optionally, according to the above embodiments of the present application, after finding m-th of domain color in a macroblock, method is also wrapped
It includes:
Step S10611 searches gray value in a macroblock and meets the pixel of preset condition, wherein preset condition include: with
The difference of the gray value of m-th of domain color is within a preset range, wherein m >=1.
Step S10613, the pixel for determining that gray value meets preset condition is text pixel.
In the above scheme, since in actual image, the boundary of text and background might not have apparent sharp
Sharp edge circle, it is possible that in the case where bounding gradient transition during text rendering, therefore can will be poor with domain color
It can also be used as text pixel away from the corresponding pixel of lesser gray value.
Above-mentioned steps can be in when progress of finding domain color every time, in an alternative embodiment, with the first domain color
For, after determining the first domain color, search in a macroblock with the difference of the gray value of the first domain color less than Δ (such as: Δ=
4, then preset range is (0,4)) pixel, and also regard the pixel found as first kind pixel.
Optionally, according to the above embodiments of the present application, determine in macro block occur according to the gray value of pixel each in macro block
The most gray value of frequency is the first domain color, comprising:
Step S10615 determines the color frequency distribution map of macro block according to the gray value of pixel each in macro block.
Step S10617 searches the gray value that the frequency of occurrences is most in macro block according to color frequency distribution map, and determination is looked into
The gray value found is the first domain color.
Optionally, according to the above embodiments of the present application, text pixel is extracted, and text pixel is spliced, obtains figure
The character block of picture, comprising:
Text pixel in each macro block is marked step S1081.
Step S1083 splices the character block being marked in each macro block, obtains the corresponding text of each macro block
Block.
The corresponding character block of each macro block is spliced, obtains the character block of image to be processed by step S1085.
In above-mentioned steps, the text pixel in each macro block found out is marked, so as to be looked into basis
Result is looked for determine the text pixel of all macro blocks.In an alternative embodiment, first according to label as a result, in each macro block
Text pixel spliced, i.e., now obtain the character area of image local to be processed, then by the splicing result of each macro block into
Row further splicing, to obtain the character area of image to be processed.
Fig. 2 is the flow chart of the method for character block in a kind of extraction image according to the embodiment of the present application, in conjunction with Fig. 2 institute
Show, the method for character block in said extracted image be illustrated:
Step S21 reads a frame image.Specifically, the image of above-mentioned reading is image to be processed.
Step S22 divides the image into the image block of M × N.
Specifically, 16 × 16,32 × 32 square can be divided the image into above-mentioned steps, other can also be divided into
The rectangular block of size.
Step S23 judges whether that all image blocks are completed in processing.The feelings of completion are all handled in all image blocks of image
Terminate process under condition, in the case where untreated completion, enters step S24 and current image block is handled.
Step S24 does the conversion of greyscale color space to current image block.
Specifically, above-mentioned formula can beIt carries out greyscale color spatial alternation and obtains gray value.
Step S25, obtained M × N number of gray value do statistic histogram.
Step S26 finds gray value and occurs being less than or equal to 4 with its difference in the largest number of greyscale colors and histogram
Color value is classified as the first domain color.
In above-mentioned steps, gray value in image block is found first and the largest number of greyscale colors occurs, then in histogram
In find with occur the largest number of greyscale colors differ be less than or equal to 4 color value, the color value found is all classified as the
One domain color.
Step S27, statistics belong to the pixel quantity N1 of the first domain color.
Step S28, judges whether the ratio of pixel shared by N1 is greater than threshold value T, in the case where the judgment result is yes, into
Enter step S219, if it is judged that be it is no, then enter step S29.
Step S29 excludes N1 pixel, and finds the second domain color according to the method similar with S26.
Specifically, the second domain color step of searching is identical as the step of finding the first domain color in above-mentioned steps, picking
Except finding the most gray value of the frequency of occurrences in the macro block of the first domain color, by the gray value found and with the gray scale that searches out
Gray value of the difference of value less than 4 is all used as the second domain color.
Step S210, statistics belong to the pixel quantity N2 of the second domain color.
Step S211, determines whether the ratio of pixel shared by S=N1+N2 is more than or equal to threshold value T.It is yes in judging result
In the case where enter step S219, if the determination result is NO, enter step S212.
Step S212 excludes N1+N2 pixel, and finds third domain color according to the method similar with S26.
Specifically, in above-mentioned steps, the step of finding the third domain color and phase the step of finding second of domain color
Together.
Step S213, statistics belong to the pixel quantity N3 of third domain color.
Step S214, judges whether the ratio of pixel shared by S=N1+N2+N3 is more than or equal to threshold value T.In judging result
S219 is entered step in the case where to be, if the determination result is NO, enters step S215.
Step S215 excludes N1+N2+N3 pixel, and finds the 4th domain color according to the method similar with S26.
Specifically, in above-mentioned steps, the step of the step of finding the 4th kind of domain color is with other domain colors of searching, is identical.
Step S216, statistics belong to the pixel quantity N4 of the 4th domain color.
Step S217, determines whether the ratio of pixel shared by S=N1+N2+N3+N4 is more than or equal to threshold value T.It is tied in judgement
Fruit is to enter step S219 in the case where being, if the determination result is NO, enters step S218.
Step S218, marking does not have text pixel appearance in this image block.
Step S219, the pixel that domain color is covered are labeled as text pixel.