CN110188764A - Character color identifying processing method and device - Google Patents

Character color identifying processing method and device Download PDF

Info

Publication number
CN110188764A
CN110188764A CN201910473365.9A CN201910473365A CN110188764A CN 110188764 A CN110188764 A CN 110188764A CN 201910473365 A CN201910473365 A CN 201910473365A CN 110188764 A CN110188764 A CN 110188764A
Authority
CN
China
Prior art keywords
cluster
character
color
boundary rectangle
color value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910473365.9A
Other languages
Chinese (zh)
Inventor
罗光玮
钱鸿强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Koubei Network Technology Co Ltd
Original Assignee
Zhejiang Koubei Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Koubei Network Technology Co Ltd filed Critical Zhejiang Koubei Network Technology Co Ltd
Priority to CN201910473365.9A priority Critical patent/CN110188764A/en
Publication of CN110188764A publication Critical patent/CN110188764A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of character color identifying processing method and devices, wherein character color identifying processing method includes: that region to be identified is extracted from picture;It treats identification region and carries out connected domain analysis, obtain the boundary rectangle of multiple character zones;For the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster element, and clustering processing is carried out to cluster element, obtain multiple cluster color values;Multiple cluster color values are compared with the background color value of the peripheral region of boundary rectangle, determine the character color of character zone.The technical solution provided according to the present invention carries out clustering processing by pixel color value in the boundary rectangle of the character zone obtained to connected domain analysis, realizes the quick identification to color included in character zone;Character color is determined by the way that multiple cluster color values are compared with the background color value of the peripheral region of boundary rectangle, effectively improves character color accuracy of identification.

Description

Character color identifying processing method and device
Technical field
The present invention relates to image processing technologies, and in particular to a kind of character color identifying processing method and device.
Background technique
The character recognition algorithms such as OCR can carry out automatic identification to the character in picture.It is calculated according to existing character recognition The character combination being closer can be spliced into one section of character according to positional relationship by method, and for the color of character, usually directly The color for connecing the character that will identify that is arranged to the pre-set colors such as black or by extracting color from the character portion in picture Mode determine the color of character.For example, application publication number provides a kind of letter for the Chinese patent application of CN 102737241A Cease processing method, this method according to from character recognition processing part character recognition processing result and String Region figure Picture determines the color of the character portion of the character string in String Region, generates character string according to the character color information of judgement Field color information.
However, in order to which preferable bandwagon effect can be obtained in picture presentation antialiasing etc. can be carried out to picture mostly Reason, the color value that above-mentioned processing will lead to character portion in picture is not a fixed value, and especially marginal position usually utilizes Be difference operation generate median as color value, then using the prior art can not accurately to character color progress Identification, has that character color accuracy of identification is lower.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind State the character color identifying processing method and device of problem.
According to an aspect of the invention, there is provided a kind of character color identifying processing method, this method comprises:
Region to be identified is extracted from picture;
It treats identification region and carries out connected domain analysis, obtain the boundary rectangle of multiple character zones;
For the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster element, and to poly- Dvielement carries out clustering processing, obtains multiple cluster color values;
Multiple cluster color values are compared with the background color value of the peripheral region of boundary rectangle, determine character zone Character color.
Further, this method further include:
According to the character color of multiple character zones, the character identification result treated in identification region is divided, and is obtained Multiple character groups.
Further, it treats identification region and carries out connected domain analysis, the boundary rectangle for obtaining multiple character zones is further Include:
Identification region is treated using seed fill algorithm and carries out connected domain analysis, obtains multiple connected domains;
According to the corresponding parameter information of multiple connected domains, multiple character zones are determined;
For each character zone, the boundary rectangle of the character zone is obtained.
Further, for the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster member Element, and clustering processing is carried out to cluster element, obtaining multiple cluster color values further comprises:
K cluster element is randomly selected from cluster element as K initial cluster center, wherein K is greater than 1;
According to K initial cluster center, clustering processing is carried out to cluster element, determines K final cluster centres and and K Corresponding K of a final cluster centre finally clusters set;
According to K final cluster set, K cluster color value is determined.
Further, according to K initial cluster center, clustering processing is carried out to cluster element, is determined in K final clusters The heart and K final cluster corresponding with a finally cluster centre of K, which are gathered, further comprises:
For any cluster element, the distance between any cluster element and K initial cluster center are calculated;
The smallest initial cluster center of the distance between selection and any cluster element from K initial cluster center, will Any cluster element is referred in the corresponding set of selected initial cluster center, obtains K cluster set;
The cluster centre of K cluster set is calculated, and judges whether K cluster centre be identical as K initial cluster center;
If so, K cluster centre is determined as K final cluster centres, and K cluster set is determined as K most Cluster set eventually;If it is not, then updating K initial cluster center according to K cluster centre, and execution is jumped for any cluster member Element calculates the distance between any cluster element and K initial cluster center.
Further, multiple cluster color values are being compared with the background color value of the peripheral region of boundary rectangle, Before the character color for determining character zone, this method further include:
According to the position parameter data of boundary rectangle, the peripheral region of boundary rectangle is determined;
Count the distribution situation of pixel color value in the peripheral region of boundary rectangle;
According to distribution situation, extraction is distributed background face of most pixel color values as peripheral region in peripheral region Color value.
Further, multiple cluster color values are compared with the background color value of the peripheral region of boundary rectangle, really The character color for determining character zone further comprises:
Calculate the diversity factor between each cluster color value and background color value;
The cluster color value that diversity factor meets preset condition is determined as to the character color of character zone.
According to another aspect of the present invention, a kind of character color recognition process unit is provided, which includes:
Extraction module, suitable for extracting region to be identified from picture;
Analysis module carries out connected domain analysis suitable for treating identification region, obtains the boundary rectangle of multiple character zones;
Cluster module, suitable for being directed to the boundary rectangle of each character zone, using pixel color value in boundary rectangle as poly- Dvielement, and clustering processing is carried out to cluster element, obtain multiple cluster color values;
Comparison module, suitable for comparing the background color value of multiple cluster color values and the peripheral region of boundary rectangle It is right, determine the character color of character zone.
Further, the device further include:
Division module treats the character identification result in identification region suitable for the character color according to multiple character zones It is divided, obtains multiple character groups.
Further, analysis module is further adapted for:
Identification region is treated using seed fill algorithm and carries out connected domain analysis, obtains multiple connected domains;
According to the corresponding parameter information of multiple connected domains, multiple character zones are determined;
For each character zone, the boundary rectangle of the character zone is obtained.
Further, cluster module is further adapted for:
K cluster element is randomly selected from cluster element as K initial cluster center, wherein K is greater than 1;
According to K initial cluster center, clustering processing is carried out to cluster element, determines K final cluster centres and and K Corresponding K of a final cluster centre finally clusters set;
According to K final cluster set, K cluster color value is determined.
Further, cluster module is further adapted for:
For any cluster element, the distance between any cluster element and K initial cluster center are calculated;
The smallest initial cluster center of the distance between selection and any cluster element from K initial cluster center, will Any cluster element is referred in the corresponding set of selected initial cluster center, obtains K cluster set;
The cluster centre of K cluster set is calculated, and judges whether K cluster centre be identical as K initial cluster center;
If so, K cluster centre is determined as K final cluster centres, and K cluster set is determined as K most Cluster set eventually;If it is not, then updating K initial cluster center according to K cluster centre, and execution is jumped for any cluster member Element calculates the distance between any cluster element and K initial cluster center.
Further, the device further include:
Peripheral region determining module determines the external zones of boundary rectangle suitable for the position parameter data according to boundary rectangle Domain;
Statistical module, the distribution situation suitable for pixel color value in the peripheral region of statistics boundary rectangle;
Background colour extraction module is suitable for according to distribution situation, and extraction is distributed most pixel color values in peripheral region Background color value as peripheral region.
Further, comparison module is further adapted for:
Calculate the diversity factor between each cluster color value and background color value;
The cluster color value that diversity factor meets preset condition is determined as to the character color of character zone.
According to another aspect of the invention, provide a kind of calculating equipment, comprising: processor, memory, communication interface and Communication bus, processor, memory and communication interface complete mutual communication by communication bus;
Memory makes processor execute above-mentioned character color identification for storing an at least executable instruction, executable instruction The corresponding operation of processing method.
In accordance with a further aspect of the present invention, a kind of computer storage medium is provided, at least one is stored in storage medium Executable instruction, executable instruction make processor execute such as the corresponding operation of above-mentioned character color identifying processing method.
The technical solution provided according to the present invention carries out connected domain analysis to the region to be identified in picture, obtains multiple The boundary rectangle of character zone is realized by carrying out clustering processing to pixel color value in boundary rectangle in character zone The quick identification of included color;The background color value of multiple cluster color values and the peripheral region of boundary rectangle is compared It is right, can more precisely determine character color from multiple cluster color values, identified character color can more subject to The true colors for really reflecting character in picture effectively improve character color accuracy of identification.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 a shows the flow diagram of character color identifying processing method according to an embodiment of the invention;
Fig. 1 b shows the schematic diagram in region to be identified;
Fig. 1 c shows the schematic diagram of the corresponding multiple character zones in region to be identified shown in Fig. 1 b;
Fig. 1 d shows the schematic diagram of the peripheral region of the boundary rectangle of character zone;
Fig. 2 shows the flow diagrams of character color identifying processing method according to another embodiment of the present invention;
Fig. 3 shows the structural block diagram of character color recognition process unit according to an embodiment of the present invention;
Fig. 4 shows a kind of structural schematic diagram for calculating equipment according to an embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Fig. 1 a shows the flow diagram of character color identifying processing method according to an embodiment of the invention, such as schemes Shown in 1a, this method comprises the following steps:
Step S101 extracts region to be identified from picture.
In many business scenarios, the demand of character color in some identification pictures is often had, then using existing The character recognition algorithms such as the OCR in technology carry out character recognition processing to picture, and region to be identified is extracted from picture, to be identified Region is character region in picture.It may include having a line character, multirow character or multistage character etc. in region to be identified.
Step S102 treats identification region and carries out connected domain analysis, obtains the boundary rectangle of multiple character zones.
Identification region is treated using seed fill algorithm during connected domain analysis and carries out connected domain analysis, is obtained more A connected domain, and then determine multiple character zones, each character zone corresponds to a complete, independent character.Wherein, it plants The principle of sub- filling algorithm be since some point inside region to be identified, that is, using the point as seed, thus to Picture point is set out until boundary outside, specifically, can pass through upper and lower, left and right four direction or upper and lower, left and right, upper left, a left side Under, upper right and the direction of bottom right eight reach any pixel in region to be identified.After obtaining multiple character zones, for every A character zone obtains the boundary rectangle of the character zone.Wherein, the boundary rectangle of character zone is to refer to package character area The smallest rectangle frame in domain.If the region to be identified extracted from picture is as shown in Figure 1 b, the multiple character zones determined can As illustrated in figure 1 c, wherein the frame in Fig. 1 c is the boundary rectangle of character zone, and a boundary rectangle corresponds to a character area Domain, a character zone correspond to a complete, independent character, and the part in boundary rectangle is its corresponding character area Domain.
Step S103, for the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster member Element, and clustering processing is carried out to cluster element, obtain multiple cluster color values.
In order to accurately and rapidly identify which color contained in character zone, multiple character zones are being obtained Boundary rectangle after, for the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster element, Then clustering processing is carried out to cluster element, obtains multiple cluster color values.Specifically, by pixel each in boundary rectangle each The color value of a Color Channel carries out clustering processing as cluster element, to cluster element, obtains multiple final cluster set, so Determine that each final cluster gathers corresponding cluster color value afterwards.
Due to not only including the pixel corresponding to character content (i.e. prospect) in the boundary rectangle of character zone, also include There is the pixel corresponding to character background, by taking character background color is solid color as an example, then cluster element can be clustered into 2 Cluster set, to obtain 2 cluster color values.This 2 cluster color values correspond respectively to character color and character background face Color, but can not determine which specific cluster color value corresponds to character color.
Those skilled in the art can select specific clustering algorithm according to actual needs, herein without limitation.For example, can adopt With K-means (K mean value) clustering algorithm, hierarchical clustering algorithm, SOM (self-organizing map neural network, Self-organizing Maps) clustering algorithm or FCM (fuzzy, Fuzzy C-Means) clustering algorithm etc. carry out clustering processing to cluster element.
Step S104 multiple cluster color values is compared with the background color value of the peripheral region of boundary rectangle, really Determine the character color of character zone.
In order to accurately determine character color from multiple cluster color values, it is also necessary to obtain the outer of character zone Connect the background color value of the peripheral region of rectangle, wherein the peripheral region of boundary rectangle can be apart from the edge of boundary rectangle The region of the pre-determined distance of pixel is also possible to the region of preset ratio bigger than the profile of boundary rectangle.Those skilled in the art Pre-determined distance and preset ratio can be configured according to actual needs, for example, 5% can be set by preset ratio, then outer The outer profile for enclosing region is all bigger than the profile of boundary rectangle by 5% in directions such as length and widths.It is that " 5 " are right in Fig. 1 c for character zone The peripheral region in the region answered, the boundary rectangle can be the part labelled in shade in Fig. 1 d.
Multiple cluster color values are compared with the background color value of the peripheral region of boundary rectangle in step S104, Specifically, compared with the background color value of peripheral region, can using the lesser cluster color value of diversity factor as character background color, Using the biggish cluster color value of diversity factor as character color, so that it is determined that character color out.It in this way, can be effective Ground, which is eliminated, gives character color identification bring interference because picture is by the processing such as antialiasing, and identified character color can be more Adequately reflect the true colors of character in picture.
Character color identifying processing method provided in this embodiment carries out connected domain point to the region to be identified in picture Analysis, obtains the boundary rectangle of multiple character zones, by carrying out clustering processing to pixel color value in boundary rectangle, realizes pair The quick identification of included color in character zone;By the background face of multiple cluster color values and the peripheral region of boundary rectangle Color value is compared, and character color, identified character color can be more precisely determined from multiple cluster color values The true colors that can more precisely reflect character in picture, effectively improve character color accuracy of identification.
Fig. 2 shows the flow diagram of character color identifying processing method according to another embodiment of the present invention, such as Fig. 2 Shown, this method comprises the following steps:
Step S201 extracts region to be identified from picture.
Character recognition is carried out to picture using character recognition algorithms such as OCR in the prior art to handle to obtain character recognition As a result, wherein character identification result can include: the information such as character position and character content, according to the word in character identification result Region to be identified is extracted in symbol position from picture, specifically, can learn picture according to the character position in character identification result In there are characters for which position, then the corresponding region of character position is extracted from picture, using extracted region as wait know Other region.
Step S202 treats identification region and carries out connected domain analysis, obtains the boundary rectangle of multiple character zones.
Specifically, identification region is treated using seed fill algorithm and carry out connected domain analysis, obtain multiple connected domains.By There can be multiple disconnected independent sectors in some characters, such as character " i " has upper and lower two independent sectors, character " % " has upper left, centre and the independent sector of bottom right three, then after carrying out connected domain analysis, if this kind of character can be divided into Dry connected domain, then also needing to utilize proximity search according to parameter informations such as the corresponding content of multiple connected domains, position, sizes Algorithm scheduling algorithm is handled, and determines multiple character zones, for each character zone, obtains the external square of the character zone Shape.
Step S203, for the boundary rectangle of each character zone, using pixel color value in boundary rectangle as cluster member Element randomly selects K cluster element as K initial cluster center from cluster element.
In the present embodiment, using K-means clustering algorithm, pixel each in boundary rectangle is led in each color The color value in road as cluster element, if picture using RGB color standard, then pixel color value includes pixel red (R), the color value of green (G), blue (B) three Color Channels.Then K cluster element is randomly selected from cluster element as K A initial cluster center, wherein K is greater than 1.Due to including the pixel corresponding to character content in the boundary rectangle of character zone With correspond to character background pixel, by the character background color in the boundary rectangle of character zone be solid color for, that 2 can be set by K, 2 cluster elements are randomly selected from cluster element as 2 initial cluster centers.
Step S204 carries out clustering processing to cluster element according to K initial cluster center, determines in K final clusters The heart and K final cluster set corresponding with a finally cluster centre of K.
According to K initial cluster center, clustering processing is carried out to cluster element, until meeting the condition of convergence, obtains K most Whole cluster centre and corresponding K final cluster set.Wherein, the condition of convergence can be the cluster centre of K cluster set It is no longer changed.Specific clustering processing mode 1 can be realized to step 5 through the following steps.Step 1, for any Cluster element calculates the distance between any cluster element and K initial cluster center;Step 2, from K initial cluster center The smallest initial cluster center of the distance between middle selection and any cluster element, any cluster element is referred to selected In the corresponding set of initial cluster center, K cluster set is obtained;Step 3, the cluster centre of K cluster set is calculated, and is sentenced Whether disconnected K cluster centre be identical as K initial cluster center;If so, thening follow the steps 4;If it is not, thening follow the steps 5;Step 4, K cluster centre is determined as K final cluster centres, and K cluster set is determined as K final cluster set;Step Rapid 5, K initial cluster center is updated according to K cluster centre, and jump and execute step 1.
Wherein, the Euclidean distance between any cluster element and K initial cluster center can be calculated in step 1, it is European Distance refers to Euclidean distance, refers specifically to the natural length in actual distance or vector in m-dimensional space between two points Degree.If it is identical as K initial cluster center that step 3 judgement obtains K cluster centre, illustrate that K cluster set meets convergence item K cluster centre is then determined as K final cluster centres, and K cluster set is determined as K final cluster set by part; If step 3 judgement obtains K cluster centre and K initial cluster center be not identical, illustrate that K cluster set does not meet convergence also Condition still needs to carry out clustering processing, then updates K initial cluster center according to K cluster centre, that is, will be in K cluster The heart is as updated K initial cluster center.After having updated K initial cluster center according to K cluster centre, jump Step 1 is executed, the distance between any cluster element and updated K initial cluster center are calculated, then according to calculating To distance classification processing is carried out to cluster element again.
Step S205 determines K cluster color value according to K final cluster set.
Gather for each final cluster, it can be using the pixel color value of the cluster centre of the final cluster set as this most Cluster gathers corresponding cluster color value eventually;Alternatively, the pixel that the final cluster gathers included cluster element can also be calculated The average value being calculated is gathered corresponding cluster color value by the average value of color value.
In order to accurately determine character color from K cluster color value, it is also necessary to obtain the outer of character zone The background color value of the peripheral region of rectangle is connect, then this method may also include that according to boundary rectangle before step S206 Position parameter data determines the peripheral region of boundary rectangle;Count the distribution of pixel color value in the peripheral region of boundary rectangle Situation;According to distribution situation, extraction is distributed background color of most pixel color values as peripheral region in peripheral region Value.Specifically, according to the position parameter data of boundary rectangle, by the region of the pre-determined distance of the edge pixel apart from boundary rectangle Or the region of the big preset ratio of profile than boundary rectangle is determined as to the peripheral region of boundary rectangle.It should be noted that Peripheral region should not be too large, should be avoided occur with other character zones it is be overlapped, with reduce interference pixel introducing.
There are multiple pixels in the peripheral region of boundary rectangle, the pixel color value of each pixel may be not exactly the same, How many distribution situation of pixel color value in peripheral region can so be counted, pixel color in statistics peripheral region It is worth identical.For being distributed most pixel color values, illustrate the pixel color value in the frequency of occurrences highest of peripheral region, the picture Plain color value is the color that peripheral region mainly occurs, then can be using the pixel color value as the background color of peripheral region Value.With directly randomly selecting compared with a certain pixel color value is as the background color value of peripheral region in peripheral region, the present invention The background color value of the peripheral region in conjunction with determined by the distribution situation of pixel color value in peripheral region can be more precisely Reflect the true background colour of peripheral region.
K cluster color value is compared step S206 with the background color value of the peripheral region of boundary rectangle, determines The character color of character zone.
In the picture design process of the prior art, it is contemplated that character shows effect in picture, generally can be by character Color is set as with character background color there are the color of larger difference degree, for example, character color is black, character background color It is white for white or character color, character background color is black etc..K cluster color value and outer so is being determined After the background color value for connecing the peripheral region of rectangle, can calculate each cluster color value and peripheral region background color value it Between diversity factor, then the cluster color value that diversity factor meets preset condition is determined as to the character color of character zone.Ability Preset condition can be arranged in field technique personnel according to actual needs, such as preset condition can be top ranked for diversity factor, i.e. difference Degree is maximum.The bigger cluster color value of diversity factor is that the probability of character color is bigger, and the smaller cluster color value of diversity factor is word The probability for according with background color is bigger, then the maximum cluster color value of diversity factor can be determined as to the character color of character zone. Character color identification bring interference is given because picture is by the processing such as antialiasing by the above-mentioned means, can effectively eliminate, Character color is more accurately determined out, character color accuracy of identification is effectively improved.
Step S207, according to the character color of multiple character zones, the character identification result treated in identification region is carried out It divides, obtains multiple character groups.
In view of that would generally be carried out in the prior art using different character colors to the character for belonging to different business attribute It distinguishes, i.e., the character of different colours corresponds to different service attributes, then can be right according to the character color of multiple character zones Character identification result in region to be identified is divided, and multiple character groups are obtained.It specifically, can be by the identical word of character color Character content in symbol recognition result is divided into one group, to obtain multiple character groups.The character color of same character group is identical, Corresponding service attribute is identical.If the schematic diagram of multiple character zones is as illustrated in figure 1 c, it is assumed that obtained through step S206 determination The character color of character zone " the present ", " day ", " spy " and " valence " be red, character zone " 1 ", " 5 ", " ", " 9 ", " 0 " and The character color of " member " be it is orange, the character color of character zone " 6 " and "fold" is black, then step S207 is obtained more A character group includes: character group " daily special ", character group " 15.90 yuan " and character group " 6 folding ".
Character color identifying processing method provided in this embodiment, obtains connected domain analysis using K-means clustering algorithm Pixel color value carries out clustering processing in the boundary rectangle of the character zone arrived, the multiple final cluster set obtained according to cluster Determine multiple cluster color values, obtained multiple cluster color values, which can more accurately reflect in character zone, to be contained Which color realizes the quick identification to color included in character zone;In conjunction with pixel color value in peripheral region Distribution situation determines the background color value of peripheral region, can more precisely reflect the true background colour of peripheral region, will Multiple cluster color values are compared to determine character color with the background color value of the peripheral region of boundary rectangle, effectively mention High character color accuracy of identification;Also, having to the character identified is realized according to the character color of multiple character zones Effect divides, and effectively improves character recognition processing accuracy, allows users to easily deposit multiple character groups respectively Storage and use;In addition, this method takes full advantage of the character identification result that existing character recognition algorithm identifies, it is not necessarily to It modifies to existing character recognition algorithm, has greatly saved development cost, improve character recognition treatment effeciency.
Fig. 3 shows the structural block diagram of character color recognition process unit according to an embodiment of the present invention, as shown in figure 3, The device includes: extraction module 310, analysis module 320, cluster module 330 and comparison module 340.
Extraction module 310 is suitable for: region to be identified is extracted from picture.
Analysis module 320 is suitable for: treating identification region and carries out connected domain analysis, obtains the external square of multiple character zones Shape.
Optionally, analysis module 320 is further adapted for: being treated identification region using seed fill algorithm and is carried out connected domain point Analysis, obtains multiple connected domains;According to the corresponding parameter information of multiple connected domains, multiple character zones are determined;For each character Region obtains the boundary rectangle of the character zone.
Cluster module 330 is suitable for: for the boundary rectangle of each character zone, using pixel color value in boundary rectangle as Cluster element, and clustering processing is carried out to cluster element, obtain multiple cluster color values.
Optionally, cluster module 330 is further adapted for: K cluster element is randomly selected from cluster element as at the beginning of K Beginning cluster centre, wherein K is greater than 1;According to K initial cluster center, clustering processing is carried out to cluster element, determines that K is a final Cluster centre and K final cluster set corresponding with a finally cluster centre of K;According to K final cluster set, determine K cluster color value.
Optionally, cluster module 330 is further adapted for: for any cluster element, at the beginning of calculating any cluster element and K The distance between beginning cluster centre;The distance between selection and any cluster element are the smallest just from K initial cluster center Any cluster element is referred in the corresponding set of selected initial cluster center, obtains K cluster set by beginning cluster centre It closes;The cluster centre of K cluster set is calculated, and judges whether K cluster centre be identical as K initial cluster center;If so, K cluster centre is then determined as K final cluster centres, and K cluster set is determined as K final cluster set;If It is no, then K initial cluster center is updated according to K cluster centre, and jump execution for any cluster element, calculated any poly- The distance between dvielement and K initial cluster center.
Comparison module 340 is suitable for: the background color value of multiple cluster color values and the peripheral region of boundary rectangle is carried out It compares, determines the character color of character zone.
Optionally, comparison module 340 is further adapted for: calculating the difference between each cluster color value and background color value Degree;The cluster color value that diversity factor meets preset condition is determined as corresponding character color.
Optionally, device further include: division module 350 treats knowledge suitable for the character color according to multiple character zones Character identification result in other region is divided, and multiple character groups are obtained.
Optionally, the device further include: peripheral region determining module 360, suitable for being believed according to the location parameter of boundary rectangle Breath, determines the peripheral region of boundary rectangle;Statistical module 370, suitable for pixel color value in the peripheral region of statistics boundary rectangle Distribution situation;Background colour extraction module 380 is suitable for according to distribution situation, and extraction is distributed most pixels in peripheral region Background color value of the color value as peripheral region.
Character color recognition process unit provided in this embodiment, obtains connected domain analysis using K-means clustering algorithm Pixel color value carries out clustering processing in the boundary rectangle of the character zone arrived, the multiple final cluster set obtained according to cluster Determine multiple cluster color values, obtained multiple cluster color values, which can more accurately reflect in character zone, to be contained Which color realizes the quick identification to color included in character zone;In conjunction with pixel color value in peripheral region Distribution situation determines the background color value of peripheral region, can more precisely reflect the true background colour of peripheral region, will Multiple cluster color values are compared to determine character color with the background color value of the peripheral region of boundary rectangle, effectively mention High character color accuracy of identification;Also, having to the character identified is realized according to the character color of multiple character zones Effect divides, and effectively improves character recognition processing accuracy, allows users to easily deposit multiple character groups respectively Storage and use;In addition, this method takes full advantage of the character identification result that existing character recognition algorithm identifies, it is not necessarily to It modifies to existing character recognition algorithm, has greatly saved development cost, improve character recognition treatment effeciency.
The present invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored at least one can It executes instruction, the character color identifying processing method in above-mentioned any means embodiment can be performed in executable instruction.
Fig. 4 shows a kind of structural schematic diagram for calculating equipment according to an embodiment of the present invention, the specific embodiment of the invention The specific implementation for calculating equipment is not limited.
As shown in figure 4, the calculating equipment may include: processor (processor) 402, communication interface (Communications Interface) 404, memory (memory) 406 and communication bus 408.
Wherein:
Processor 402, communication interface 404 and memory 406 complete mutual communication by communication bus 408.
Communication interface 404, for being communicated with the network element of other equipment such as client or other servers etc..
Processor 402 can specifically execute above-mentioned character color identifying processing embodiment of the method for executing program 410 In correlation step.
Specifically, program 410 may include program code, which includes computer operation instruction.
Processor 402 may be central processor CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention Road.The one or more processors that equipment includes are calculated, can be same type of processor, such as one or more CPU;It can also To be different types of processor, such as one or more CPU and one or more ASIC.
Memory 406, for storing program 410.Memory 406 may include high speed RAM memory, it is also possible to further include Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.
Program 410 specifically can be used for so that the character color that processor 402 executes in above-mentioned any means embodiment is known Other processing method.The specific implementation of each step may refer to the phase in above-mentioned character color identifying processing embodiment in program 410 Corresponding description in step and unit is answered, this will not be repeated here.It is apparent to those skilled in the art that for description Convenienct and succinct, the equipment of foregoing description and the specific work process of module, can be with reference to pair in preceding method embodiment Process description is answered, details are not described herein.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize one of some or all components according to embodiments of the present invention A little or repertoire.The present invention is also implemented as setting for executing some or all of method as described herein Standby or program of device (for example, computer program and computer program product).It is such to realize that program of the invention deposit Storage on a computer-readable medium, or may be in the form of one or more signals.Such signal can be from because of spy It downloads and obtains on net website, be perhaps provided on the carrier signal or be provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims (10)

1. a kind of character color identifying processing method, which comprises
Region to be identified is extracted from picture;
Connected domain analysis is carried out to the region to be identified, obtains the boundary rectangle of multiple character zones;
For the boundary rectangle of each character zone, using pixel color value in the boundary rectangle as cluster element, and to institute It states cluster element and carries out clustering processing, obtain multiple cluster color values;
The multiple cluster color value is compared with the background color value of the peripheral region of the boundary rectangle, described in determination The character color of character zone.
2. according to the method described in claim 1, wherein, the method also includes:
According to the character color of multiple character zones, the character identification result in the region to be identified is divided, is obtained Multiple character groups.
3. it is described that connected domain analysis is carried out to the region to be identified according to the method described in claim 1, wherein, it obtains more The boundary rectangle of a character zone further comprises:
Connected domain analysis is carried out to the region to be identified using seed fill algorithm, obtains multiple connected domains;
According to the corresponding parameter information of multiple connected domains, multiple character zones are determined;
For each character zone, the boundary rectangle of the character zone is obtained.
4. according to the method described in claim 1, wherein, the boundary rectangle for each character zone will be described external Pixel color value carries out clustering processing as cluster element, and to the cluster element in rectangle, obtains multiple cluster color values Further comprise:
K cluster element is randomly selected from the cluster element as K initial cluster center, wherein K is greater than 1;
According to the K initial cluster center, clustering processing is carried out to the cluster element, determine K final cluster centres with And set is finally clustered with a corresponding K of final cluster centre of the K;
According to K final cluster set, K cluster color value is determined.
5. it is described according to the K initial cluster center according to the method described in claim 4, wherein, to the cluster element Clustering processing is carried out, determines K final cluster centres and K final cluster corresponding with a finally cluster centre of the K Set further comprises:
For any cluster element, the distance between any cluster element and the K initial cluster center are calculated;
From selection in the K initial cluster center in the smallest initial clustering of the distance between any cluster element Any cluster element is referred in the corresponding set of selected initial cluster center by the heart, obtains K cluster set;
The cluster centre of the K cluster set is calculated, and judges whether K cluster centre be identical as K initial cluster center;
If so, the K cluster centre is determined as K final cluster centres, and K cluster set is determined as K A final cluster set;If it is not, then updating K initial cluster center according to the K cluster centre, and it is described right to jump execution In any cluster element, the distance between any cluster element and the K initial cluster center are calculated.
6. method according to claim 1-5, wherein described by the multiple cluster color value and described outer The background color value for connecing the peripheral region of rectangle is compared, before the character color for determining the character zone, the method Further include:
According to the position parameter data of the boundary rectangle, the peripheral region of the boundary rectangle is determined;
Count the distribution situation of pixel color value in the peripheral region of the boundary rectangle;
According to the distribution situation, extraction is distributed most pixel color values as the peripheral region in the peripheral region Background color value.
7. method according to claim 1-6, wherein it is described by the multiple cluster color value with it is described external The background color value of the peripheral region of rectangle is compared, and determines that the character color of the character zone further comprises:
Calculate the diversity factor between each cluster color value and the background color value;
The cluster color value that diversity factor meets preset condition is determined as to the character color of the character zone.
8. a kind of character color recognition process unit, described device include:
Extraction module, suitable for extracting region to be identified from picture;
Analysis module is suitable for carrying out connected domain analysis to the region to be identified, obtains the boundary rectangle of multiple character zones;
Cluster module, suitable for being directed to the boundary rectangle of each character zone, using pixel color value in the boundary rectangle as poly- Dvielement, and clustering processing is carried out to the cluster element, obtain multiple cluster color values;
Comparison module, suitable for carrying out the background color value of the multiple cluster color value and the peripheral region of the boundary rectangle It compares, determines the character color of the character zone.
9. a kind of calculating equipment, comprising: processor, memory, communication interface and communication bus, the processor, the storage Device and the communication interface complete mutual communication by the communication bus;
The memory executes the processor as right is wanted for storing an at least executable instruction, the executable instruction Ask the corresponding operation of character color identifying processing method described in any one of 1-7.
10. a kind of computer storage medium, an at least executable instruction, the executable instruction are stored in the storage medium Processor is set to execute such as the corresponding operation of character color identifying processing method of any of claims 1-7.
CN201910473365.9A 2019-05-31 2019-05-31 Character color identifying processing method and device Pending CN110188764A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910473365.9A CN110188764A (en) 2019-05-31 2019-05-31 Character color identifying processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910473365.9A CN110188764A (en) 2019-05-31 2019-05-31 Character color identifying processing method and device

Publications (1)

Publication Number Publication Date
CN110188764A true CN110188764A (en) 2019-08-30

Family

ID=67719634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910473365.9A Pending CN110188764A (en) 2019-05-31 2019-05-31 Character color identifying processing method and device

Country Status (1)

Country Link
CN (1) CN110188764A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488885A (en) * 2020-06-28 2020-08-04 成都四方伟业软件股份有限公司 Intelligent extraction method and device for theme color system of picture
CN112861985A (en) * 2021-02-24 2021-05-28 郑州轻工业大学 Automatic book classification method based on artificial intelligence
CN113223016A (en) * 2021-05-13 2021-08-06 上海西虹桥导航技术有限公司 Image segmentation method and device for plant seedlings, electronic equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1419679A (en) * 2000-03-14 2003-05-21 英特尔公司 Estimating text color and segmentation of images
US20140023267A1 (en) * 2011-03-10 2014-01-23 Omron Corporation Character string detection device, image processing device, character string detection method, control program and storage medium
CN104573685A (en) * 2015-01-29 2015-04-29 中南大学 Natural scene text detecting method based on extraction of linear structures
CN105740860A (en) * 2016-01-28 2016-07-06 河南大学 Automatic detection method for Chinese character area of shop sign in natural scene
CN106874937A (en) * 2017-01-18 2017-06-20 腾讯科技(上海)有限公司 A kind of character image generation method, device and terminal
CN107784301A (en) * 2016-08-31 2018-03-09 百度在线网络技术(北京)有限公司 Method and apparatus for identifying character area in image
CN109447086A (en) * 2018-09-19 2019-03-08 浙江口碑网络技术有限公司 A kind of extracting method and device of picture character color

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1419679A (en) * 2000-03-14 2003-05-21 英特尔公司 Estimating text color and segmentation of images
US20140023267A1 (en) * 2011-03-10 2014-01-23 Omron Corporation Character string detection device, image processing device, character string detection method, control program and storage medium
CN104573685A (en) * 2015-01-29 2015-04-29 中南大学 Natural scene text detecting method based on extraction of linear structures
CN105740860A (en) * 2016-01-28 2016-07-06 河南大学 Automatic detection method for Chinese character area of shop sign in natural scene
CN107784301A (en) * 2016-08-31 2018-03-09 百度在线网络技术(北京)有限公司 Method and apparatus for identifying character area in image
CN106874937A (en) * 2017-01-18 2017-06-20 腾讯科技(上海)有限公司 A kind of character image generation method, device and terminal
CN109447086A (en) * 2018-09-19 2019-03-08 浙江口碑网络技术有限公司 A kind of extracting method and device of picture character color

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YAN SONG 等: "A Novel Image Text Extraction Method Based on K-means Clustering", 《INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE》 *
刘华颖: "基于角点与颜色特征的视频文本提取算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488885A (en) * 2020-06-28 2020-08-04 成都四方伟业软件股份有限公司 Intelligent extraction method and device for theme color system of picture
CN111488885B (en) * 2020-06-28 2020-09-25 成都四方伟业软件股份有限公司 Intelligent extraction method and device for theme color system of picture
CN112861985A (en) * 2021-02-24 2021-05-28 郑州轻工业大学 Automatic book classification method based on artificial intelligence
CN112861985B (en) * 2021-02-24 2023-01-31 郑州轻工业大学 Automatic book classification method based on artificial intelligence
CN113223016A (en) * 2021-05-13 2021-08-06 上海西虹桥导航技术有限公司 Image segmentation method and device for plant seedlings, electronic equipment and medium

Similar Documents

Publication Publication Date Title
US8452109B2 (en) Image segregation system with method for handling textures
US8600169B2 (en) Method and system for learning a same-material constraint in an image
JP5998049B2 (en) Automated computerized method for processing images and a computer program for performing such a method
EP2356614B1 (en) A constraint generator for use in image segregation
EP2435956B1 (en) Multi-resolution analysis in an image segregation
CN110188764A (en) Character color identifying processing method and device
US8139867B2 (en) Image segregation system architecture
US7760912B2 (en) Image segregation system with method for handling textures
CN106530305A (en) Semantic segmentation model training and image segmentation method and device, and calculating equipment
US8260050B2 (en) Test bed for optimizing an image segregation
CN109636825A (en) Seal graphics dividing method, device and computer readable storage medium
EP2171645A1 (en) System and method for identifying complex tokens in an image
CN110782466B (en) Picture segmentation method, device and system
CN109145964B (en) Method and system for realizing image color clustering
CN107682685A (en) White balancing treatment method and device, electronic installation and computer-readable recording medium
CN110321892A (en) A kind of picture screening technique, device and electronic equipment
CN109690562A (en) Accelerate the image preprocessing of cytology image classification by full convolutional neural networks
CN108182426A (en) Coloured image sorting technique and device
CN113345038A (en) Embroidery image processing method and device, electronic equipment and storage medium
CN109753937A (en) It is a kind of nesting target recognition methods and device
CN108805190A (en) A kind of image processing method and device
CN107886550A (en) Picture editting's transmission method and system
CN110533735A (en) A kind of visualization color matching method, storage medium and terminal device based on image set
WO2007044828A2 (en) System and method for edge detection in image processing and recognition
Farhoodi Publisher Recognition from Book Cover Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190830

RJ01 Rejection of invention patent application after publication