CN110188764A - Character color identifying processing method and device - Google Patents
Character color identifying processing method and device Download PDFInfo
- Publication number
- CN110188764A CN110188764A CN201910473365.9A CN201910473365A CN110188764A CN 110188764 A CN110188764 A CN 110188764A CN 201910473365 A CN201910473365 A CN 201910473365A CN 110188764 A CN110188764 A CN 110188764A
- Authority
- CN
- China
- Prior art keywords
- cluster
- character
- color
- boundary rectangle
- color value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of character color identifying processing method and devices, wherein character color identifying processing method includes: that region to be identified is extracted from picture;It treats identification region and carries out connected domain analysis, obtain the boundary rectangle of multiple character zones;For the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster element, and clustering processing is carried out to cluster element, obtain multiple cluster color values;Multiple cluster color values are compared with the background color value of the peripheral region of boundary rectangle, determine the character color of character zone.The technical solution provided according to the present invention carries out clustering processing by pixel color value in the boundary rectangle of the character zone obtained to connected domain analysis, realizes the quick identification to color included in character zone;Character color is determined by the way that multiple cluster color values are compared with the background color value of the peripheral region of boundary rectangle, effectively improves character color accuracy of identification.
Description
Technical field
The present invention relates to image processing technologies, and in particular to a kind of character color identifying processing method and device.
Background technique
The character recognition algorithms such as OCR can carry out automatic identification to the character in picture.It is calculated according to existing character recognition
The character combination being closer can be spliced into one section of character according to positional relationship by method, and for the color of character, usually directly
The color for connecing the character that will identify that is arranged to the pre-set colors such as black or by extracting color from the character portion in picture
Mode determine the color of character.For example, application publication number provides a kind of letter for the Chinese patent application of CN 102737241A
Cease processing method, this method according to from character recognition processing part character recognition processing result and String Region figure
Picture determines the color of the character portion of the character string in String Region, generates character string according to the character color information of judgement
Field color information.
However, in order to which preferable bandwagon effect can be obtained in picture presentation antialiasing etc. can be carried out to picture mostly
Reason, the color value that above-mentioned processing will lead to character portion in picture is not a fixed value, and especially marginal position usually utilizes
Be difference operation generate median as color value, then using the prior art can not accurately to character color progress
Identification, has that character color accuracy of identification is lower.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind
State the character color identifying processing method and device of problem.
According to an aspect of the invention, there is provided a kind of character color identifying processing method, this method comprises:
Region to be identified is extracted from picture;
It treats identification region and carries out connected domain analysis, obtain the boundary rectangle of multiple character zones;
For the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster element, and to poly-
Dvielement carries out clustering processing, obtains multiple cluster color values;
Multiple cluster color values are compared with the background color value of the peripheral region of boundary rectangle, determine character zone
Character color.
Further, this method further include:
According to the character color of multiple character zones, the character identification result treated in identification region is divided, and is obtained
Multiple character groups.
Further, it treats identification region and carries out connected domain analysis, the boundary rectangle for obtaining multiple character zones is further
Include:
Identification region is treated using seed fill algorithm and carries out connected domain analysis, obtains multiple connected domains;
According to the corresponding parameter information of multiple connected domains, multiple character zones are determined;
For each character zone, the boundary rectangle of the character zone is obtained.
Further, for the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster member
Element, and clustering processing is carried out to cluster element, obtaining multiple cluster color values further comprises:
K cluster element is randomly selected from cluster element as K initial cluster center, wherein K is greater than 1;
According to K initial cluster center, clustering processing is carried out to cluster element, determines K final cluster centres and and K
Corresponding K of a final cluster centre finally clusters set;
According to K final cluster set, K cluster color value is determined.
Further, according to K initial cluster center, clustering processing is carried out to cluster element, is determined in K final clusters
The heart and K final cluster corresponding with a finally cluster centre of K, which are gathered, further comprises:
For any cluster element, the distance between any cluster element and K initial cluster center are calculated;
The smallest initial cluster center of the distance between selection and any cluster element from K initial cluster center, will
Any cluster element is referred in the corresponding set of selected initial cluster center, obtains K cluster set;
The cluster centre of K cluster set is calculated, and judges whether K cluster centre be identical as K initial cluster center;
If so, K cluster centre is determined as K final cluster centres, and K cluster set is determined as K most
Cluster set eventually;If it is not, then updating K initial cluster center according to K cluster centre, and execution is jumped for any cluster member
Element calculates the distance between any cluster element and K initial cluster center.
Further, multiple cluster color values are being compared with the background color value of the peripheral region of boundary rectangle,
Before the character color for determining character zone, this method further include:
According to the position parameter data of boundary rectangle, the peripheral region of boundary rectangle is determined;
Count the distribution situation of pixel color value in the peripheral region of boundary rectangle;
According to distribution situation, extraction is distributed background face of most pixel color values as peripheral region in peripheral region
Color value.
Further, multiple cluster color values are compared with the background color value of the peripheral region of boundary rectangle, really
The character color for determining character zone further comprises:
Calculate the diversity factor between each cluster color value and background color value;
The cluster color value that diversity factor meets preset condition is determined as to the character color of character zone.
According to another aspect of the present invention, a kind of character color recognition process unit is provided, which includes:
Extraction module, suitable for extracting region to be identified from picture;
Analysis module carries out connected domain analysis suitable for treating identification region, obtains the boundary rectangle of multiple character zones;
Cluster module, suitable for being directed to the boundary rectangle of each character zone, using pixel color value in boundary rectangle as poly-
Dvielement, and clustering processing is carried out to cluster element, obtain multiple cluster color values;
Comparison module, suitable for comparing the background color value of multiple cluster color values and the peripheral region of boundary rectangle
It is right, determine the character color of character zone.
Further, the device further include:
Division module treats the character identification result in identification region suitable for the character color according to multiple character zones
It is divided, obtains multiple character groups.
Further, analysis module is further adapted for:
Identification region is treated using seed fill algorithm and carries out connected domain analysis, obtains multiple connected domains;
According to the corresponding parameter information of multiple connected domains, multiple character zones are determined;
For each character zone, the boundary rectangle of the character zone is obtained.
Further, cluster module is further adapted for:
K cluster element is randomly selected from cluster element as K initial cluster center, wherein K is greater than 1;
According to K initial cluster center, clustering processing is carried out to cluster element, determines K final cluster centres and and K
Corresponding K of a final cluster centre finally clusters set;
According to K final cluster set, K cluster color value is determined.
Further, cluster module is further adapted for:
For any cluster element, the distance between any cluster element and K initial cluster center are calculated;
The smallest initial cluster center of the distance between selection and any cluster element from K initial cluster center, will
Any cluster element is referred in the corresponding set of selected initial cluster center, obtains K cluster set;
The cluster centre of K cluster set is calculated, and judges whether K cluster centre be identical as K initial cluster center;
If so, K cluster centre is determined as K final cluster centres, and K cluster set is determined as K most
Cluster set eventually;If it is not, then updating K initial cluster center according to K cluster centre, and execution is jumped for any cluster member
Element calculates the distance between any cluster element and K initial cluster center.
Further, the device further include:
Peripheral region determining module determines the external zones of boundary rectangle suitable for the position parameter data according to boundary rectangle
Domain;
Statistical module, the distribution situation suitable for pixel color value in the peripheral region of statistics boundary rectangle;
Background colour extraction module is suitable for according to distribution situation, and extraction is distributed most pixel color values in peripheral region
Background color value as peripheral region.
Further, comparison module is further adapted for:
Calculate the diversity factor between each cluster color value and background color value;
The cluster color value that diversity factor meets preset condition is determined as to the character color of character zone.
According to another aspect of the invention, provide a kind of calculating equipment, comprising: processor, memory, communication interface and
Communication bus, processor, memory and communication interface complete mutual communication by communication bus;
Memory makes processor execute above-mentioned character color identification for storing an at least executable instruction, executable instruction
The corresponding operation of processing method.
In accordance with a further aspect of the present invention, a kind of computer storage medium is provided, at least one is stored in storage medium
Executable instruction, executable instruction make processor execute such as the corresponding operation of above-mentioned character color identifying processing method.
The technical solution provided according to the present invention carries out connected domain analysis to the region to be identified in picture, obtains multiple
The boundary rectangle of character zone is realized by carrying out clustering processing to pixel color value in boundary rectangle in character zone
The quick identification of included color;The background color value of multiple cluster color values and the peripheral region of boundary rectangle is compared
It is right, can more precisely determine character color from multiple cluster color values, identified character color can more subject to
The true colors for really reflecting character in picture effectively improve character color accuracy of identification.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 a shows the flow diagram of character color identifying processing method according to an embodiment of the invention;
Fig. 1 b shows the schematic diagram in region to be identified;
Fig. 1 c shows the schematic diagram of the corresponding multiple character zones in region to be identified shown in Fig. 1 b;
Fig. 1 d shows the schematic diagram of the peripheral region of the boundary rectangle of character zone;
Fig. 2 shows the flow diagrams of character color identifying processing method according to another embodiment of the present invention;
Fig. 3 shows the structural block diagram of character color recognition process unit according to an embodiment of the present invention;
Fig. 4 shows a kind of structural schematic diagram for calculating equipment according to an embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
Fig. 1 a shows the flow diagram of character color identifying processing method according to an embodiment of the invention, such as schemes
Shown in 1a, this method comprises the following steps:
Step S101 extracts region to be identified from picture.
In many business scenarios, the demand of character color in some identification pictures is often had, then using existing
The character recognition algorithms such as the OCR in technology carry out character recognition processing to picture, and region to be identified is extracted from picture, to be identified
Region is character region in picture.It may include having a line character, multirow character or multistage character etc. in region to be identified.
Step S102 treats identification region and carries out connected domain analysis, obtains the boundary rectangle of multiple character zones.
Identification region is treated using seed fill algorithm during connected domain analysis and carries out connected domain analysis, is obtained more
A connected domain, and then determine multiple character zones, each character zone corresponds to a complete, independent character.Wherein, it plants
The principle of sub- filling algorithm be since some point inside region to be identified, that is, using the point as seed, thus to
Picture point is set out until boundary outside, specifically, can pass through upper and lower, left and right four direction or upper and lower, left and right, upper left, a left side
Under, upper right and the direction of bottom right eight reach any pixel in region to be identified.After obtaining multiple character zones, for every
A character zone obtains the boundary rectangle of the character zone.Wherein, the boundary rectangle of character zone is to refer to package character area
The smallest rectangle frame in domain.If the region to be identified extracted from picture is as shown in Figure 1 b, the multiple character zones determined can
As illustrated in figure 1 c, wherein the frame in Fig. 1 c is the boundary rectangle of character zone, and a boundary rectangle corresponds to a character area
Domain, a character zone correspond to a complete, independent character, and the part in boundary rectangle is its corresponding character area
Domain.
Step S103, for the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster member
Element, and clustering processing is carried out to cluster element, obtain multiple cluster color values.
In order to accurately and rapidly identify which color contained in character zone, multiple character zones are being obtained
Boundary rectangle after, for the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster element,
Then clustering processing is carried out to cluster element, obtains multiple cluster color values.Specifically, by pixel each in boundary rectangle each
The color value of a Color Channel carries out clustering processing as cluster element, to cluster element, obtains multiple final cluster set, so
Determine that each final cluster gathers corresponding cluster color value afterwards.
Due to not only including the pixel corresponding to character content (i.e. prospect) in the boundary rectangle of character zone, also include
There is the pixel corresponding to character background, by taking character background color is solid color as an example, then cluster element can be clustered into 2
Cluster set, to obtain 2 cluster color values.This 2 cluster color values correspond respectively to character color and character background face
Color, but can not determine which specific cluster color value corresponds to character color.
Those skilled in the art can select specific clustering algorithm according to actual needs, herein without limitation.For example, can adopt
With K-means (K mean value) clustering algorithm, hierarchical clustering algorithm, SOM (self-organizing map neural network, Self-organizing
Maps) clustering algorithm or FCM (fuzzy, Fuzzy C-Means) clustering algorithm etc. carry out clustering processing to cluster element.
Step S104 multiple cluster color values is compared with the background color value of the peripheral region of boundary rectangle, really
Determine the character color of character zone.
In order to accurately determine character color from multiple cluster color values, it is also necessary to obtain the outer of character zone
Connect the background color value of the peripheral region of rectangle, wherein the peripheral region of boundary rectangle can be apart from the edge of boundary rectangle
The region of the pre-determined distance of pixel is also possible to the region of preset ratio bigger than the profile of boundary rectangle.Those skilled in the art
Pre-determined distance and preset ratio can be configured according to actual needs, for example, 5% can be set by preset ratio, then outer
The outer profile for enclosing region is all bigger than the profile of boundary rectangle by 5% in directions such as length and widths.It is that " 5 " are right in Fig. 1 c for character zone
The peripheral region in the region answered, the boundary rectangle can be the part labelled in shade in Fig. 1 d.
Multiple cluster color values are compared with the background color value of the peripheral region of boundary rectangle in step S104,
Specifically, compared with the background color value of peripheral region, can using the lesser cluster color value of diversity factor as character background color,
Using the biggish cluster color value of diversity factor as character color, so that it is determined that character color out.It in this way, can be effective
Ground, which is eliminated, gives character color identification bring interference because picture is by the processing such as antialiasing, and identified character color can be more
Adequately reflect the true colors of character in picture.
Character color identifying processing method provided in this embodiment carries out connected domain point to the region to be identified in picture
Analysis, obtains the boundary rectangle of multiple character zones, by carrying out clustering processing to pixel color value in boundary rectangle, realizes pair
The quick identification of included color in character zone;By the background face of multiple cluster color values and the peripheral region of boundary rectangle
Color value is compared, and character color, identified character color can be more precisely determined from multiple cluster color values
The true colors that can more precisely reflect character in picture, effectively improve character color accuracy of identification.
Fig. 2 shows the flow diagram of character color identifying processing method according to another embodiment of the present invention, such as Fig. 2
Shown, this method comprises the following steps:
Step S201 extracts region to be identified from picture.
Character recognition is carried out to picture using character recognition algorithms such as OCR in the prior art to handle to obtain character recognition
As a result, wherein character identification result can include: the information such as character position and character content, according to the word in character identification result
Region to be identified is extracted in symbol position from picture, specifically, can learn picture according to the character position in character identification result
In there are characters for which position, then the corresponding region of character position is extracted from picture, using extracted region as wait know
Other region.
Step S202 treats identification region and carries out connected domain analysis, obtains the boundary rectangle of multiple character zones.
Specifically, identification region is treated using seed fill algorithm and carry out connected domain analysis, obtain multiple connected domains.By
There can be multiple disconnected independent sectors in some characters, such as character " i " has upper and lower two independent sectors, character
" % " has upper left, centre and the independent sector of bottom right three, then after carrying out connected domain analysis, if this kind of character can be divided into
Dry connected domain, then also needing to utilize proximity search according to parameter informations such as the corresponding content of multiple connected domains, position, sizes
Algorithm scheduling algorithm is handled, and determines multiple character zones, for each character zone, obtains the external square of the character zone
Shape.
Step S203, for the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster member
Element randomly selects K cluster element as K initial cluster center from cluster element.
In the present embodiment, using K-means clustering algorithm, pixel each in boundary rectangle is led in each color
The color value in road as cluster element, if picture using RGB color standard, then pixel color value includes pixel red
(R), the color value of green (G), blue (B) three Color Channels.Then K cluster element is randomly selected from cluster element as K
A initial cluster center, wherein K is greater than 1.Due to including the pixel corresponding to character content in the boundary rectangle of character zone
With correspond to character background pixel, by the character background color in the boundary rectangle of character zone be solid color for, that
2 can be set by K, 2 cluster elements are randomly selected from cluster element as 2 initial cluster centers.
Step S204 carries out clustering processing to cluster element according to K initial cluster center, determines in K final clusters
The heart and K final cluster set corresponding with a finally cluster centre of K.
According to K initial cluster center, clustering processing is carried out to cluster element, until meeting the condition of convergence, obtains K most
Whole cluster centre and corresponding K final cluster set.Wherein, the condition of convergence can be the cluster centre of K cluster set
It is no longer changed.Specific clustering processing mode 1 can be realized to step 5 through the following steps.Step 1, for any
Cluster element calculates the distance between any cluster element and K initial cluster center;Step 2, from K initial cluster center
The smallest initial cluster center of the distance between middle selection and any cluster element, any cluster element is referred to selected
In the corresponding set of initial cluster center, K cluster set is obtained;Step 3, the cluster centre of K cluster set is calculated, and is sentenced
Whether disconnected K cluster centre be identical as K initial cluster center;If so, thening follow the steps 4;If it is not, thening follow the steps 5;Step
4, K cluster centre is determined as K final cluster centres, and K cluster set is determined as K final cluster set;Step
Rapid 5, K initial cluster center is updated according to K cluster centre, and jump and execute step 1.
Wherein, the Euclidean distance between any cluster element and K initial cluster center can be calculated in step 1, it is European
Distance refers to Euclidean distance, refers specifically to the natural length in actual distance or vector in m-dimensional space between two points
Degree.If it is identical as K initial cluster center that step 3 judgement obtains K cluster centre, illustrate that K cluster set meets convergence item
K cluster centre is then determined as K final cluster centres, and K cluster set is determined as K final cluster set by part;
If step 3 judgement obtains K cluster centre and K initial cluster center be not identical, illustrate that K cluster set does not meet convergence also
Condition still needs to carry out clustering processing, then updates K initial cluster center according to K cluster centre, that is, will be in K cluster
The heart is as updated K initial cluster center.After having updated K initial cluster center according to K cluster centre, jump
Step 1 is executed, the distance between any cluster element and updated K initial cluster center are calculated, then according to calculating
To distance classification processing is carried out to cluster element again.
Step S205 determines K cluster color value according to K final cluster set.
Gather for each final cluster, it can be using the pixel color value of the cluster centre of the final cluster set as this most
Cluster gathers corresponding cluster color value eventually;Alternatively, the pixel that the final cluster gathers included cluster element can also be calculated
The average value being calculated is gathered corresponding cluster color value by the average value of color value.
In order to accurately determine character color from K cluster color value, it is also necessary to obtain the outer of character zone
The background color value of the peripheral region of rectangle is connect, then this method may also include that according to boundary rectangle before step S206
Position parameter data determines the peripheral region of boundary rectangle;Count the distribution of pixel color value in the peripheral region of boundary rectangle
Situation;According to distribution situation, extraction is distributed background color of most pixel color values as peripheral region in peripheral region
Value.Specifically, according to the position parameter data of boundary rectangle, by the region of the pre-determined distance of the edge pixel apart from boundary rectangle
Or the region of the big preset ratio of profile than boundary rectangle is determined as to the peripheral region of boundary rectangle.It should be noted that
Peripheral region should not be too large, should be avoided occur with other character zones it is be overlapped, with reduce interference pixel introducing.
There are multiple pixels in the peripheral region of boundary rectangle, the pixel color value of each pixel may be not exactly the same,
How many distribution situation of pixel color value in peripheral region can so be counted, pixel color in statistics peripheral region
It is worth identical.For being distributed most pixel color values, illustrate the pixel color value in the frequency of occurrences highest of peripheral region, the picture
Plain color value is the color that peripheral region mainly occurs, then can be using the pixel color value as the background color of peripheral region
Value.With directly randomly selecting compared with a certain pixel color value is as the background color value of peripheral region in peripheral region, the present invention
The background color value of the peripheral region in conjunction with determined by the distribution situation of pixel color value in peripheral region can be more precisely
Reflect the true background colour of peripheral region.
K cluster color value is compared step S206 with the background color value of the peripheral region of boundary rectangle, determines
The character color of character zone.
In the picture design process of the prior art, it is contemplated that character shows effect in picture, generally can be by character
Color is set as with character background color there are the color of larger difference degree, for example, character color is black, character background color
It is white for white or character color, character background color is black etc..K cluster color value and outer so is being determined
After the background color value for connecing the peripheral region of rectangle, can calculate each cluster color value and peripheral region background color value it
Between diversity factor, then the cluster color value that diversity factor meets preset condition is determined as to the character color of character zone.Ability
Preset condition can be arranged in field technique personnel according to actual needs, such as preset condition can be top ranked for diversity factor, i.e. difference
Degree is maximum.The bigger cluster color value of diversity factor is that the probability of character color is bigger, and the smaller cluster color value of diversity factor is word
The probability for according with background color is bigger, then the maximum cluster color value of diversity factor can be determined as to the character color of character zone.
Character color identification bring interference is given because picture is by the processing such as antialiasing by the above-mentioned means, can effectively eliminate,
Character color is more accurately determined out, character color accuracy of identification is effectively improved.
Step S207, according to the character color of multiple character zones, the character identification result treated in identification region is carried out
It divides, obtains multiple character groups.
In view of that would generally be carried out in the prior art using different character colors to the character for belonging to different business attribute
It distinguishes, i.e., the character of different colours corresponds to different service attributes, then can be right according to the character color of multiple character zones
Character identification result in region to be identified is divided, and multiple character groups are obtained.It specifically, can be by the identical word of character color
Character content in symbol recognition result is divided into one group, to obtain multiple character groups.The character color of same character group is identical,
Corresponding service attribute is identical.If the schematic diagram of multiple character zones is as illustrated in figure 1 c, it is assumed that obtained through step S206 determination
The character color of character zone " the present ", " day ", " spy " and " valence " be red, character zone " 1 ", " 5 ", " ", " 9 ", " 0 " and
The character color of " member " be it is orange, the character color of character zone " 6 " and "fold" is black, then step S207 is obtained more
A character group includes: character group " daily special ", character group " 15.90 yuan " and character group " 6 folding ".
Character color identifying processing method provided in this embodiment, obtains connected domain analysis using K-means clustering algorithm
Pixel color value carries out clustering processing in the boundary rectangle of the character zone arrived, the multiple final cluster set obtained according to cluster
Determine multiple cluster color values, obtained multiple cluster color values, which can more accurately reflect in character zone, to be contained
Which color realizes the quick identification to color included in character zone;In conjunction with pixel color value in peripheral region
Distribution situation determines the background color value of peripheral region, can more precisely reflect the true background colour of peripheral region, will
Multiple cluster color values are compared to determine character color with the background color value of the peripheral region of boundary rectangle, effectively mention
High character color accuracy of identification;Also, having to the character identified is realized according to the character color of multiple character zones
Effect divides, and effectively improves character recognition processing accuracy, allows users to easily deposit multiple character groups respectively
Storage and use;In addition, this method takes full advantage of the character identification result that existing character recognition algorithm identifies, it is not necessarily to
It modifies to existing character recognition algorithm, has greatly saved development cost, improve character recognition treatment effeciency.
Fig. 3 shows the structural block diagram of character color recognition process unit according to an embodiment of the present invention, as shown in figure 3,
The device includes: extraction module 310, analysis module 320, cluster module 330 and comparison module 340.
Extraction module 310 is suitable for: region to be identified is extracted from picture.
Analysis module 320 is suitable for: treating identification region and carries out connected domain analysis, obtains the external square of multiple character zones
Shape.
Optionally, analysis module 320 is further adapted for: being treated identification region using seed fill algorithm and is carried out connected domain point
Analysis, obtains multiple connected domains;According to the corresponding parameter information of multiple connected domains, multiple character zones are determined;For each character
Region obtains the boundary rectangle of the character zone.
Cluster module 330 is suitable for: for the boundary rectangle of each character zone, using pixel color value in boundary rectangle as
Cluster element, and clustering processing is carried out to cluster element, obtain multiple cluster color values.
Optionally, cluster module 330 is further adapted for: K cluster element is randomly selected from cluster element as at the beginning of K
Beginning cluster centre, wherein K is greater than 1;According to K initial cluster center, clustering processing is carried out to cluster element, determines that K is a final
Cluster centre and K final cluster set corresponding with a finally cluster centre of K;According to K final cluster set, determine
K cluster color value.
Optionally, cluster module 330 is further adapted for: for any cluster element, at the beginning of calculating any cluster element and K
The distance between beginning cluster centre;The distance between selection and any cluster element are the smallest just from K initial cluster center
Any cluster element is referred in the corresponding set of selected initial cluster center, obtains K cluster set by beginning cluster centre
It closes;The cluster centre of K cluster set is calculated, and judges whether K cluster centre be identical as K initial cluster center;If so,
K cluster centre is then determined as K final cluster centres, and K cluster set is determined as K final cluster set;If
It is no, then K initial cluster center is updated according to K cluster centre, and jump execution for any cluster element, calculated any poly-
The distance between dvielement and K initial cluster center.
Comparison module 340 is suitable for: the background color value of multiple cluster color values and the peripheral region of boundary rectangle is carried out
It compares, determines the character color of character zone.
Optionally, comparison module 340 is further adapted for: calculating the difference between each cluster color value and background color value
Degree;The cluster color value that diversity factor meets preset condition is determined as corresponding character color.
Optionally, device further include: division module 350 treats knowledge suitable for the character color according to multiple character zones
Character identification result in other region is divided, and multiple character groups are obtained.
Optionally, the device further include: peripheral region determining module 360, suitable for being believed according to the location parameter of boundary rectangle
Breath, determines the peripheral region of boundary rectangle;Statistical module 370, suitable for pixel color value in the peripheral region of statistics boundary rectangle
Distribution situation;Background colour extraction module 380 is suitable for according to distribution situation, and extraction is distributed most pixels in peripheral region
Background color value of the color value as peripheral region.
Character color recognition process unit provided in this embodiment, obtains connected domain analysis using K-means clustering algorithm
Pixel color value carries out clustering processing in the boundary rectangle of the character zone arrived, the multiple final cluster set obtained according to cluster
Determine multiple cluster color values, obtained multiple cluster color values, which can more accurately reflect in character zone, to be contained
Which color realizes the quick identification to color included in character zone;In conjunction with pixel color value in peripheral region
Distribution situation determines the background color value of peripheral region, can more precisely reflect the true background colour of peripheral region, will
Multiple cluster color values are compared to determine character color with the background color value of the peripheral region of boundary rectangle, effectively mention
High character color accuracy of identification;Also, having to the character identified is realized according to the character color of multiple character zones
Effect divides, and effectively improves character recognition processing accuracy, allows users to easily deposit multiple character groups respectively
Storage and use;In addition, this method takes full advantage of the character identification result that existing character recognition algorithm identifies, it is not necessarily to
It modifies to existing character recognition algorithm, has greatly saved development cost, improve character recognition treatment effeciency.
The present invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored at least one can
It executes instruction, the character color identifying processing method in above-mentioned any means embodiment can be performed in executable instruction.
Fig. 4 shows a kind of structural schematic diagram for calculating equipment according to an embodiment of the present invention, the specific embodiment of the invention
The specific implementation for calculating equipment is not limited.
As shown in figure 4, the calculating equipment may include: processor (processor) 402, communication interface
(Communications Interface) 404, memory (memory) 406 and communication bus 408.
Wherein:
Processor 402, communication interface 404 and memory 406 complete mutual communication by communication bus 408.
Communication interface 404, for being communicated with the network element of other equipment such as client or other servers etc..
Processor 402 can specifically execute above-mentioned character color identifying processing embodiment of the method for executing program 410
In correlation step.
Specifically, program 410 may include program code, which includes computer operation instruction.
Processor 402 may be central processor CPU or specific integrated circuit ASIC (Application
Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention
Road.The one or more processors that equipment includes are calculated, can be same type of processor, such as one or more CPU;It can also
To be different types of processor, such as one or more CPU and one or more ASIC.
Memory 406, for storing program 410.Memory 406 may include high speed RAM memory, it is also possible to further include
Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.
Program 410 specifically can be used for so that the character color that processor 402 executes in above-mentioned any means embodiment is known
Other processing method.The specific implementation of each step may refer to the phase in above-mentioned character color identifying processing embodiment in program 410
Corresponding description in step and unit is answered, this will not be repeated here.It is apparent to those skilled in the art that for description
Convenienct and succinct, the equipment of foregoing description and the specific work process of module, can be with reference to pair in preceding method embodiment
Process description is answered, details are not described herein.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein.
Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system
Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various
Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects,
Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect
Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
Meaning one of can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors
Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice
Microprocessor or digital signal processor (DSP) realize one of some or all components according to embodiments of the present invention
A little or repertoire.The present invention is also implemented as setting for executing some or all of method as described herein
Standby or program of device (for example, computer program and computer program product).It is such to realize that program of the invention deposit
Storage on a computer-readable medium, or may be in the form of one or more signals.Such signal can be from because of spy
It downloads and obtains on net website, be perhaps provided on the carrier signal or be provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch
To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame
Claim.
Claims (10)
1. a kind of character color identifying processing method, which comprises
Region to be identified is extracted from picture;
Connected domain analysis is carried out to the region to be identified, obtains the boundary rectangle of multiple character zones;
For the boundary rectangle of each character zone, using pixel color value in the boundary rectangle as cluster element, and to institute
It states cluster element and carries out clustering processing, obtain multiple cluster color values;
The multiple cluster color value is compared with the background color value of the peripheral region of the boundary rectangle, described in determination
The character color of character zone.
2. according to the method described in claim 1, wherein, the method also includes:
According to the character color of multiple character zones, the character identification result in the region to be identified is divided, is obtained
Multiple character groups.
3. it is described that connected domain analysis is carried out to the region to be identified according to the method described in claim 1, wherein, it obtains more
The boundary rectangle of a character zone further comprises:
Connected domain analysis is carried out to the region to be identified using seed fill algorithm, obtains multiple connected domains;
According to the corresponding parameter information of multiple connected domains, multiple character zones are determined;
For each character zone, the boundary rectangle of the character zone is obtained.
4. according to the method described in claim 1, wherein, the boundary rectangle for each character zone will be described external
Pixel color value carries out clustering processing as cluster element, and to the cluster element in rectangle, obtains multiple cluster color values
Further comprise:
K cluster element is randomly selected from the cluster element as K initial cluster center, wherein K is greater than 1;
According to the K initial cluster center, clustering processing is carried out to the cluster element, determine K final cluster centres with
And set is finally clustered with a corresponding K of final cluster centre of the K;
According to K final cluster set, K cluster color value is determined.
5. it is described according to the K initial cluster center according to the method described in claim 4, wherein, to the cluster element
Clustering processing is carried out, determines K final cluster centres and K final cluster corresponding with a finally cluster centre of the K
Set further comprises:
For any cluster element, the distance between any cluster element and the K initial cluster center are calculated;
From selection in the K initial cluster center in the smallest initial clustering of the distance between any cluster element
Any cluster element is referred in the corresponding set of selected initial cluster center by the heart, obtains K cluster set;
The cluster centre of the K cluster set is calculated, and judges whether K cluster centre be identical as K initial cluster center;
If so, the K cluster centre is determined as K final cluster centres, and K cluster set is determined as K
A final cluster set;If it is not, then updating K initial cluster center according to the K cluster centre, and it is described right to jump execution
In any cluster element, the distance between any cluster element and the K initial cluster center are calculated.
6. method according to claim 1-5, wherein described by the multiple cluster color value and described outer
The background color value for connecing the peripheral region of rectangle is compared, before the character color for determining the character zone, the method
Further include:
According to the position parameter data of the boundary rectangle, the peripheral region of the boundary rectangle is determined;
Count the distribution situation of pixel color value in the peripheral region of the boundary rectangle;
According to the distribution situation, extraction is distributed most pixel color values as the peripheral region in the peripheral region
Background color value.
7. method according to claim 1-6, wherein it is described by the multiple cluster color value with it is described external
The background color value of the peripheral region of rectangle is compared, and determines that the character color of the character zone further comprises:
Calculate the diversity factor between each cluster color value and the background color value;
The cluster color value that diversity factor meets preset condition is determined as to the character color of the character zone.
8. a kind of character color recognition process unit, described device include:
Extraction module, suitable for extracting region to be identified from picture;
Analysis module is suitable for carrying out connected domain analysis to the region to be identified, obtains the boundary rectangle of multiple character zones;
Cluster module, suitable for being directed to the boundary rectangle of each character zone, using pixel color value in the boundary rectangle as poly-
Dvielement, and clustering processing is carried out to the cluster element, obtain multiple cluster color values;
Comparison module, suitable for carrying out the background color value of the multiple cluster color value and the peripheral region of the boundary rectangle
It compares, determines the character color of the character zone.
9. a kind of calculating equipment, comprising: processor, memory, communication interface and communication bus, the processor, the storage
Device and the communication interface complete mutual communication by the communication bus;
The memory executes the processor as right is wanted for storing an at least executable instruction, the executable instruction
Ask the corresponding operation of character color identifying processing method described in any one of 1-7.
10. a kind of computer storage medium, an at least executable instruction, the executable instruction are stored in the storage medium
Processor is set to execute such as the corresponding operation of character color identifying processing method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910473365.9A CN110188764A (en) | 2019-05-31 | 2019-05-31 | Character color identifying processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910473365.9A CN110188764A (en) | 2019-05-31 | 2019-05-31 | Character color identifying processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110188764A true CN110188764A (en) | 2019-08-30 |
Family
ID=67719634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910473365.9A Pending CN110188764A (en) | 2019-05-31 | 2019-05-31 | Character color identifying processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110188764A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111488885A (en) * | 2020-06-28 | 2020-08-04 | 成都四方伟业软件股份有限公司 | Intelligent extraction method and device for theme color system of picture |
CN112861985A (en) * | 2021-02-24 | 2021-05-28 | 郑州轻工业大学 | Automatic book classification method based on artificial intelligence |
CN113223016A (en) * | 2021-05-13 | 2021-08-06 | 上海西虹桥导航技术有限公司 | Image segmentation method and device for plant seedlings, electronic equipment and medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1419679A (en) * | 2000-03-14 | 2003-05-21 | 英特尔公司 | Estimating text color and segmentation of images |
US20140023267A1 (en) * | 2011-03-10 | 2014-01-23 | Omron Corporation | Character string detection device, image processing device, character string detection method, control program and storage medium |
CN104573685A (en) * | 2015-01-29 | 2015-04-29 | 中南大学 | Natural scene text detecting method based on extraction of linear structures |
CN105740860A (en) * | 2016-01-28 | 2016-07-06 | 河南大学 | Automatic detection method for Chinese character area of shop sign in natural scene |
CN106874937A (en) * | 2017-01-18 | 2017-06-20 | 腾讯科技(上海)有限公司 | A kind of character image generation method, device and terminal |
CN107784301A (en) * | 2016-08-31 | 2018-03-09 | 百度在线网络技术(北京)有限公司 | Method and apparatus for identifying character area in image |
CN109447086A (en) * | 2018-09-19 | 2019-03-08 | 浙江口碑网络技术有限公司 | A kind of extracting method and device of picture character color |
-
2019
- 2019-05-31 CN CN201910473365.9A patent/CN110188764A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1419679A (en) * | 2000-03-14 | 2003-05-21 | 英特尔公司 | Estimating text color and segmentation of images |
US20140023267A1 (en) * | 2011-03-10 | 2014-01-23 | Omron Corporation | Character string detection device, image processing device, character string detection method, control program and storage medium |
CN104573685A (en) * | 2015-01-29 | 2015-04-29 | 中南大学 | Natural scene text detecting method based on extraction of linear structures |
CN105740860A (en) * | 2016-01-28 | 2016-07-06 | 河南大学 | Automatic detection method for Chinese character area of shop sign in natural scene |
CN107784301A (en) * | 2016-08-31 | 2018-03-09 | 百度在线网络技术(北京)有限公司 | Method and apparatus for identifying character area in image |
CN106874937A (en) * | 2017-01-18 | 2017-06-20 | 腾讯科技(上海)有限公司 | A kind of character image generation method, device and terminal |
CN109447086A (en) * | 2018-09-19 | 2019-03-08 | 浙江口碑网络技术有限公司 | A kind of extracting method and device of picture character color |
Non-Patent Citations (2)
Title |
---|
YAN SONG 等: "A Novel Image Text Extraction Method Based on K-means Clustering", 《INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE》 * |
刘华颖: "基于角点与颜色特征的视频文本提取算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111488885A (en) * | 2020-06-28 | 2020-08-04 | 成都四方伟业软件股份有限公司 | Intelligent extraction method and device for theme color system of picture |
CN111488885B (en) * | 2020-06-28 | 2020-09-25 | 成都四方伟业软件股份有限公司 | Intelligent extraction method and device for theme color system of picture |
CN112861985A (en) * | 2021-02-24 | 2021-05-28 | 郑州轻工业大学 | Automatic book classification method based on artificial intelligence |
CN112861985B (en) * | 2021-02-24 | 2023-01-31 | 郑州轻工业大学 | Automatic book classification method based on artificial intelligence |
CN113223016A (en) * | 2021-05-13 | 2021-08-06 | 上海西虹桥导航技术有限公司 | Image segmentation method and device for plant seedlings, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8452109B2 (en) | Image segregation system with method for handling textures | |
US8600169B2 (en) | Method and system for learning a same-material constraint in an image | |
JP5998049B2 (en) | Automated computerized method for processing images and a computer program for performing such a method | |
EP2356614B1 (en) | A constraint generator for use in image segregation | |
EP2435956B1 (en) | Multi-resolution analysis in an image segregation | |
CN110188764A (en) | Character color identifying processing method and device | |
US8139867B2 (en) | Image segregation system architecture | |
US7760912B2 (en) | Image segregation system with method for handling textures | |
CN106530305A (en) | Semantic segmentation model training and image segmentation method and device, and calculating equipment | |
US8260050B2 (en) | Test bed for optimizing an image segregation | |
CN109636825A (en) | Seal graphics dividing method, device and computer readable storage medium | |
EP2171645A1 (en) | System and method for identifying complex tokens in an image | |
CN110782466B (en) | Picture segmentation method, device and system | |
CN109145964B (en) | Method and system for realizing image color clustering | |
CN107682685A (en) | White balancing treatment method and device, electronic installation and computer-readable recording medium | |
CN110321892A (en) | A kind of picture screening technique, device and electronic equipment | |
CN109690562A (en) | Accelerate the image preprocessing of cytology image classification by full convolutional neural networks | |
CN108182426A (en) | Coloured image sorting technique and device | |
CN113345038A (en) | Embroidery image processing method and device, electronic equipment and storage medium | |
CN109753937A (en) | It is a kind of nesting target recognition methods and device | |
CN108805190A (en) | A kind of image processing method and device | |
CN107886550A (en) | Picture editting's transmission method and system | |
CN110533735A (en) | A kind of visualization color matching method, storage medium and terminal device based on image set | |
WO2007044828A2 (en) | System and method for edge detection in image processing and recognition | |
Farhoodi | Publisher Recognition from Book Cover Images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190830 |
|
RJ01 | Rejection of invention patent application after publication |