CN111563505A - Character detection method and device based on pixel segmentation and merging - Google Patents

Character detection method and device based on pixel segmentation and merging Download PDF

Info

Publication number
CN111563505A
CN111563505A CN201910114195.5A CN201910114195A CN111563505A CN 111563505 A CN111563505 A CN 111563505A CN 201910114195 A CN201910114195 A CN 201910114195A CN 111563505 A CN111563505 A CN 111563505A
Authority
CN
China
Prior art keywords
pixel
pixel points
pixel point
picture
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910114195.5A
Other languages
Chinese (zh)
Inventor
田伟伟
董健
颜水成
卢禹锟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201910114195.5A priority Critical patent/CN111563505A/en
Publication of CN111563505A publication Critical patent/CN111563505A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a character detection method and a character detection device based on pixel segmentation and merging, wherein the method comprises the following steps: extracting the characteristic information of the picture to be detected, and generating a characteristic diagram corresponding to the picture to be detected according to the extracted characteristic information; carrying out pixel segmentation on the characteristic image to obtain a plurality of pixel points, and analyzing confidence scores of the pixel points belonging to the character pixel points; extracting the position information of the pixel points within the preset confidence score range in the text box to which the pixel points belong, and forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points; and merging the pixel points in the same connected domain, and determining the character region on the picture to be detected by using the merged pixel points. According to the method, the characteristic graph of the picture to be detected is subjected to pixel segmentation, a plurality of connected domains are formed according to the position information of the character pixel points, and the pixel points in the connected domains are combined, so that the character region on the picture to be detected can be effectively and accurately determined according to the combined pixel points.

Description

Character detection method and device based on pixel segmentation and merging
Technical Field
The invention relates to the technical field of character detection, in particular to a character detection method and device based on pixel segmentation and merging.
Background
In the prior art, a general character detection technology mainly comprises character detection and text scan image character detection in a natural scene, and two common methods are adopted at present, wherein the first method is character detection based on a target detection method, confidence coefficients and text box coordinates corresponding to each point of feature images with different scales are extracted through a convolutional neural network, and then duplication of all candidate boxes is removed through Non-Maximum Suppression (NMS); and the other method is to carry out pixel segmentation through a convolutional neural network and regress the confidence coefficient of whether each pixel point belongs to characters or not and the coordinates of a text box.
However, the two methods have different problems, and thus cannot be used as a general character detection technology well, the first method cannot accurately fit the coordinate position of a pixel point when a character has a large rotation angle, and the second method cannot completely detect a long text due to the limitation of the size of a convolution kernel.
Disclosure of Invention
In view of the above, the present invention is proposed to provide a text detection method and apparatus based on pixel division and merging, which overcomes or at least partially solves the above problems.
According to an aspect of the present invention, a text detection method based on pixel segmentation and merging is provided, which includes:
extracting the characteristic information of a picture to be detected, and generating a characteristic diagram corresponding to the picture to be detected according to the extracted characteristic information;
carrying out pixel segmentation on the characteristic image to obtain a plurality of pixel points, and analyzing confidence scores of the pixel points belonging to the character pixel points;
extracting the position information of the pixel points within the preset confidence score range in the text box to which the pixel points belong, and forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points;
and merging the pixel points in the same connected domain, and determining the character region on the picture to be detected by using the merged pixel points.
Optionally, the extracting the feature information of the picture to be detected, and generating the feature map corresponding to the picture to be detected according to the extracted feature information includes:
extracting the characteristic information of the picture to be detected based on a UNet network structure deep learning model;
and performing up-sampling, down-sampling and corresponding convolution operation on the picture to be detected according to the extracted feature information to obtain a feature map corresponding to the picture to be detected.
Optionally, the higher the confidence score of the pixel point belonging to the text pixel point is, the higher the probability of the pixel point belonging to the text pixel point is and the lower the probability of the pixel point belonging to the picture background is;
the smaller the confidence score of the pixel point belonging to the character pixel point is, the smaller the probability of the pixel point belonging to the character pixel point is and the larger the probability of the pixel point belonging to the picture background is.
Optionally, extracting position information of the pixel point within the preset confidence score range in the text box to which the pixel point belongs, including:
obtaining pixel points with confidence scores larger than preset scores from a plurality of pixel points obtained by segmentation;
and extracting the coordinate value of the pixel point with the value larger than the preset value in the character frame to which the pixel point belongs.
Optionally, the forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points includes:
judging whether any pixel point is cascaded with the pixel point in the appointed direction in the character frame to which the pixel point belongs according to the extracted coordinate value of the any pixel point;
if yes, the arbitrary pixel point and the pixel point in the appointed direction in the text frame to which the arbitrary pixel point belongs are cascaded;
and forming a plurality of connected domains according to the mutually cascaded pixel points, wherein the pixel points belonging to the same connected domain are mutually cascaded.
Optionally, the forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points includes:
and carrying out binarization processing on the feature map, and forming a plurality of connected domains of the pixel points in the image after binarization processing according to the extracted position information of the pixel points.
Optionally, merging the pixel points in the same connected domain, and determining the text region on the picture to be detected by using the merged pixel points, including:
merging the pixel points in the same connected domain, and positioning the position information of the corresponding text frame according to the position information of the merged pixel points;
and determining the character area on the picture to be detected according to the position information of the character frame.
According to another aspect of the present invention, there is provided a text detection apparatus based on pixel segmentation and merging, including:
the generating module is suitable for extracting the characteristic information of the picture to be detected and generating a characteristic graph corresponding to the picture to be detected according to the extracted characteristic information;
the analysis module is suitable for carrying out pixel segmentation on the characteristic image to obtain a plurality of pixel points and analyzing the confidence score of each pixel point belonging to a character pixel point;
the forming module is suitable for extracting the position information of the pixel points within the preset confidence score value range in the text box to which the pixel points belong, and forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points;
and the merging module is suitable for merging the pixel points in the same connected domain, and determining the character region on the picture to be detected by using the merged pixel points.
Optionally, the generating module is further adapted to:
extracting the characteristic information of the picture to be detected based on a UNet network structure deep learning model;
and performing up-sampling, down-sampling and corresponding convolution operation on the picture to be detected according to the extracted feature information to obtain a feature map corresponding to the picture to be detected.
Optionally, the higher the confidence score of the pixel point belonging to the text pixel point is, the higher the probability of the pixel point belonging to the text pixel point is and the lower the probability of the pixel point belonging to the picture background is;
the smaller the confidence score of the pixel point belonging to the character pixel point is, the smaller the probability of the pixel point belonging to the character pixel point is and the larger the probability of the pixel point belonging to the picture background is.
Optionally, the forming module comprises an acquiring unit and an extracting unit,
the acquisition unit is suitable for acquiring pixel points with confidence scores larger than preset scores from the plurality of pixel points obtained by segmentation;
the extraction unit is suitable for extracting the coordinate value of the pixel point with the value larger than the preset value in the character frame to which the pixel point belongs.
Optionally, the forming module further includes:
the judging unit is suitable for judging whether any pixel point is cascaded with the pixel point in the appointed direction in the text box to which the pixel point belongs according to the extracted coordinate value of the any pixel point;
the forming unit is suitable for cascading the any pixel point and the pixel point in the appointed direction in the text box to which the any pixel point belongs if the judging unit determines that the any pixel point and the pixel point in the appointed direction in the text box to which the any pixel point belongs are cascaded;
and forming a plurality of connected domains according to the mutually cascaded pixel points, wherein the pixel points belonging to the same connected domain are mutually cascaded.
Optionally, the forming module is further adapted to:
and carrying out binarization processing on the feature map, and forming a plurality of connected domains of the pixel points in the image after binarization processing according to the extracted position information of the pixel points.
Optionally, the merging module is further adapted to:
merging the pixel points in the same connected domain, and positioning the position information of the corresponding text frame according to the position information of the merged pixel points;
and determining the character area on the picture to be detected according to the position information of the character frame.
According to yet another aspect of the present invention, there is also provided a computer storage medium having stored thereon computer program code which, when run on a computing device, causes the computing device to execute the method for text detection based on pixel segmentation and merging as described in any of the embodiments above.
In accordance with yet another aspect of the present invention, there is also provided a computing device comprising: a processor; a memory storing computer program code; the computer program code, when executed by the processor, causes the computing device to perform a method for pixel segmentation and merging based text detection as described in any of the embodiments above.
In the embodiment of the invention, the characteristic information of the picture to be detected is extracted, the characteristic image corresponding to the picture to be detected is generated according to the extracted characteristic information, then the characteristic image is subjected to pixel segmentation to obtain a plurality of pixel points, the confidence score of each pixel point belonging to a character pixel point is analyzed, the position information of the pixel point in the preset confidence score range in the character frame to which the pixel point belongs is extracted, a plurality of connected domains of the pixel point are formed according to the position information of the extracted pixel point, the pixel points in the same connected domain are combined, and the character region on the picture to be detected is determined by utilizing the combined pixel point. Therefore, the embodiment of the invention carries out pixel segmentation on the characteristic graph of the picture to be detected, then forms a plurality of connected domains according to the position information of the character pixel points, merges the pixel points in each connected domain, and each connected domain corresponds to one character, thereby effectively and accurately determining the character region on the picture to be detected according to the merged pixel points. Furthermore, the character region is determined by adopting the connected domain formed among the pixel points, so that the characters, long texts and the like which are rotated on the image to be detected can be effectively detected.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a text detection method based on pixel segmentation and merging according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a text detection process based on pixel segmentation and merging according to one embodiment of the invention;
FIG. 3 is a schematic structural diagram of a text detection apparatus based on pixel segmentation and merging according to an embodiment of the present invention; and
fig. 4 is a schematic structural diagram of a text detection apparatus based on pixel segmentation and merging according to another embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In order to solve the above technical problem, an embodiment of the present invention provides a text detection method based on pixel segmentation and merging. Fig. 1 is a flow chart of a text detection method based on pixel segmentation and merging according to an embodiment of the present invention. Referring to fig. 1, the method includes at least steps S102 to S108.
And S102, extracting the characteristic information of the picture to be detected, and generating a characteristic graph corresponding to the picture to be detected according to the extracted characteristic information.
And step S104, carrying out pixel segmentation on the characteristic image to obtain a plurality of pixel points, and analyzing the confidence score of each pixel point belonging to a character pixel point.
And step S106, extracting the position information of the pixel points within the preset confidence score range in the text box to which the pixel points belong, and forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points.
And S108, merging the pixel points in the same connected domain, and determining the character region on the picture to be detected by using the merged pixel points.
In the embodiment of the invention, the feature information of the picture to be detected can be extracted firstly, the feature map corresponding to the picture to be detected is generated according to the extracted feature information, then the feature map is subjected to pixel segmentation to obtain a plurality of pixel points, the confidence score of each pixel point belonging to a character pixel point is analyzed, the position information of the pixel point in the preset confidence score range in the character frame to which the pixel point belongs is extracted, a plurality of connected domains of the pixel point are formed according to the position information of the extracted pixel point, the pixel points in the same connected domain are combined, and the character region on the picture to be detected is determined by utilizing the combined pixel point. Therefore, according to the embodiment of the invention, a plurality of connected domains are formed according to the position information of a plurality of pixel points obtained by dividing the picture to be detected, and then the pixel points in the connected domains are combined, so that each connected domain corresponds to one character, and the character region on the picture to be detected can be effectively and accurately determined according to the combined pixel points. Furthermore, the character region is determined by adopting the connected domain formed among the pixel points, so that the characters, long texts and the like which are rotated on the image to be detected can be effectively detected.
Referring to step S102, in an embodiment of the present invention, when the feature information of the to-be-detected picture is extracted and the feature map corresponding to the to-be-detected picture is generated according to the extracted feature information, the feature information of the to-be-detected picture may be extracted based on the UNet network structure deep learning model, and then the to-be-detected picture is subjected to upsampling, downsampling and corresponding convolution operations according to the extracted feature information to obtain the feature map corresponding to the to-be-detected picture, that is, the to-be-detected picture is subjected to convolution kernel processing to obtain the corresponding feature map.
The feature map may refer to a high-level semantic feature map, which is essentially a matrix, for example, the high-level semantic feature map is essentially a three-dimensional matrix of 100 by 10, and each 100 by 100 matrix in the three-dimensional matrix may be regarded as a feature map.
In an embodiment of the present invention, in a score region, the higher the confidence score of a pixel point belonging to a text pixel point, the higher the probability of the pixel point belonging to the text pixel point and the lower the probability of the pixel point belonging to an image background, and conversely, the lower the confidence score of the pixel point belonging to the text pixel point, the lower the probability of the pixel point belonging to the text pixel point and the higher the probability of the pixel point belonging to the image background.
For example, in the value region 0-1 of the pixel point, a confidence value of 0 indicates that the pixel point completely belongs to the background and does not belong to the text, and a confidence value of 1 indicates that the pixel point completely belongs to the text and does not belong to the background. The closer the confidence score is to 1, the higher the probability that the pixel belongs to the character pixel is, the smaller the probability that the pixel belongs to the background is, the closer the confidence score is to 0, the higher the probability that the pixel belongs to the background is, and the smaller the probability that the pixel belongs to the character pixel is. If the confidence score of a pixel point is analyzed to be 0.1, it can be determined that the pixel point has a high probability of belonging to the background and a low probability of belonging to the characters.
In this embodiment, the text pixel is a pixel representing a point on the text. Generally, if the image to be detected contains characters, the characters are formed by combining a plurality of character pixels. The probability and confidence score of a pixel being a text pixel can represent the probability that the pixel is a text pixel.
In an embodiment of the invention, a plurality of convolution layers in the UNet network structure deep learning model can be adopted to carry out convolution operation on the extracted feature information of the image to be detected, and the confidence score of each pixel point as a character pixel point is obtained. Of course, the embodiment of the present invention may also adopt other network models to extract the feature map corresponding to the picture to be detected, and the type of the network model is not specifically limited here.
Referring to step S104, in an embodiment of the present invention, an over-convolution neural network may be used to segment the feature map to obtain a plurality of pixel points, and then a confidence score that each pixel point belongs to a text pixel point is analyzed, where the convolution neural network may be a UNet network, or of course, other convolution neural networks may also be used, which is not specifically limited in this embodiment of the present invention.
Referring to step S106, in an embodiment of the present invention, when extracting the position information of the pixel point within the preset confidence score range in the text box to which the pixel point belongs, the pixel point with the confidence score greater than the preset score is obtained from the plurality of pixel points obtained by segmentation, that is, the pixel point with the high probability that the pixel point belongs to the text pixel point is selected, and then the coordinate value of the pixel point with the confidence score greater than the preset score in the text box to which the pixel point belongs is extracted. For example, if the confidence score range is greater than 0.7 and less than 1, the preset score is set to 0.7, and if the confidence score of a pixel is greater than 0.7, the probability that the pixel belongs to a text pixel is high, and at this time, the pixel can be regarded as a text pixel.
With reference to step S106, in an embodiment of the present invention, when a plurality of connected domains of a pixel point are formed according to the extracted position information of the pixel point, whether the any pixel point is cascaded with the pixel point in the designated direction in the text box to which the any pixel point belongs may be determined according to the extracted coordinate value of the any pixel point, if so, the any pixel point and the pixel point in the designated direction in the text box to which the any pixel point belongs may be determined to be cascaded, and then, a plurality of connected domains may be formed according to the mutually cascaded pixel points, where the pixel points belonging to one connected domain are cascaded with each other.
In this embodiment, the pixel point in the designated direction in the text box to which any pixel point belongs may be four pixel points in the upper, lower, left, and right directions in the text box to which any pixel point is located, or may also be eight pixel points in the upper, lower, left, right, oblique upper (oblique upper left and oblique upper right), oblique lower (oblique lower left and oblique lower right) directions in the text box to which any pixel point is located.
In an embodiment of the present invention, when a plurality of connected domains of a pixel are formed according to the extracted position information of the pixel, a binarization process may be performed on the feature map first, so that a plurality of connected domains of the pixel are formed in the binarized image according to the extracted position information of the pixel.
Referring to step S108, in an embodiment of the present invention, after the pixel points in the same connected domain are merged, the position information of the corresponding text box can be located according to the position information of the merged pixel points, and then the text region on the to-be-detected picture is determined according to the position information of the text box. For example, the position of the text box is located according to the coordinate value of the merged pixel point relative to the text box to which the pixel point belongs. Wherein, a character corresponds to a character frame, and the determined character area is the combination of a plurality of character frames.
In this embodiment, the position information of the text may be defined by the position information of the text box at the periphery of the text, and the text box is usually rectangular, such as the minimum bounding rectangle at the periphery of the text. When the position information of the corresponding text box is located, the position of the text box can be represented by using the coordinates of the four vertexes of the text box, that is, the position information of the text box is located by outputting the coordinate values of the four vertexes of the text box. Or may be represented by the coordinates of one vertex of the text box and the width and height of the text box.
After detecting the text area on the picture, in an embodiment of the present invention, the text in the text area may be further recognized, for example, by using OCR (Optical Character Recognition), so as to recognize that the text area in the picture specifically includes the text information. For some units related to network security, whether illegal characters exist in the image can be judged in the mode. For example, for pictures containing yellow gambling virus character information, the scheme of the invention can be adopted to quickly and accurately detect the character area related to the yellow gambling virus character information in the pictures, further adopt an OCR technology to identify specific character content, and find out published websites or picture sources related to illegal character pictures so as to adopt corresponding processing means to process.
Referring to fig. 2, a text detection method according to an embodiment of the present invention is described by taking a deep learning model of a UNet network structure as an example.
After a picture to be detected (namely an original picture) is input into a deep learning model of a UNet network structure, feature information of the picture to be detected is extracted by the deep learning model, then the extracted feature information is input into a convolution unit, and up-sampling, down-sampling and corresponding convolution operation are performed on the picture to be detected in the convolution unit according to the extracted feature information, so that a feature map corresponding to the picture to be detected is obtained and a corresponding feature map is generated. The conv block pool shown in fig. 2 represents the convolution operation corresponding to down-sampling, and the conv block up pool represents the convolution operation corresponding to up-sampling. After the feature map is subjected to pixel segmentation to obtain a plurality of pixel points, the confidence score linkscore of each pixel point belonging to a character pixel point can be analyzed, wherein score represents the confidence score of one character pixel point, and 4 behind the link score represents the confidence scores of four pixel points in four directions, namely, the upper direction, the lower direction, the left direction and the right direction, in a character frame where the line pixel point is cascaded. And further, extracting the position information of the pixel point with the confidence score larger than the preset score relative to the text box of the text box to which the pixel point belongs, wherein 5 behind the text box represents 5 values of the boundary distance between the pixel point and the text box and the distance between the pixel point and the upper part, the lower part, the left part and the right part of the text box and the rotation angle of the pixel point.
Based on the same inventive concept, an embodiment of the present invention further provides a text detection apparatus based on pixel segmentation and merging, and fig. 3 illustrates a schematic structural diagram of an apparatus for text detection based on pixel segmentation and merging according to an embodiment of the present invention. Referring to fig. 3, the apparatus 300 for text detection based on pixel segmentation and merging includes a generation module 310, an analysis module 320, a formation module 330, and a merging module 340.
The functions of the components or devices of the text detection device 300 based on pixel division and combination and the connection relationship between the components will now be described:
the generating module 310 is adapted to extract feature information of the picture to be detected, and generate a feature map corresponding to the picture to be detected according to the extracted feature information;
the analysis module 320 is coupled with the generation module 310 and is suitable for performing pixel segmentation on the feature map to obtain a plurality of pixel points and analyzing the confidence scores of the pixel points belonging to the characters;
a forming module 330, coupled to the analyzing module 320, adapted to extract position information of the pixel points within the preset confidence score range within the text box to which the pixel points belong, and form a plurality of connected domains of the pixel points according to the extracted position information of the pixel points;
the merging module 340 is coupled to the forming module 330, and is adapted to merge pixels in the same connected domain, and determine a text region on the to-be-detected picture by using the merged pixels.
In an embodiment of the present invention, the generating module 310 is further adapted to extract feature information of the to-be-detected picture based on the UNet network structure deep learning model, and perform upsampling, downsampling and corresponding convolution operations on the to-be-detected picture according to the extracted feature information, so as to obtain a feature map corresponding to the to-be-detected picture.
In an embodiment of the present invention, the higher the confidence score of a pixel point belonging to a text pixel point is, the higher the probability of the pixel point belonging to the text pixel point is and the lower the probability of the pixel point belonging to the picture background is;
the smaller the confidence score of a pixel belonging to a text pixel is, the smaller the probability of the pixel belonging to the text pixel is and the larger the probability of the pixel belonging to the picture background is.
The embodiment of the invention also provides another character detection device based on pixel segmentation and merging, and fig. 4 is a schematic structural diagram of a character detection device based on pixel segmentation and merging according to another embodiment of the invention. Referring to fig. 4, the forming module 330 in the apparatus 300 for detecting text based on pixel division and combination further includes an acquiring unit 331, an extracting unit 332, a determining unit 333, and a forming unit 334.
The obtaining unit 331 is adapted to obtain a pixel point with a confidence score larger than a preset score from the plurality of pixel points obtained by segmentation.
The extracting unit 332 is coupled to the obtaining unit 331, and is adapted to extract the coordinate value of the pixel point greater than the preset score in the text box to which the pixel point belongs.
The determining unit 333 is coupled to the extracting unit 332, and adapted to determine whether any pixel is cascaded with a pixel in a designated direction in the text box to which the pixel belongs according to the extracted coordinate value of the any pixel.
The forming unit 334 is coupled to the determining unit 333, and is adapted to cascade the arbitrary pixel point and the pixel point in the designated direction in the text box to which the arbitrary pixel point belongs if the determining unit 333 determines that the arbitrary pixel point is cascaded with the pixel point in the designated direction in the text box to which the arbitrary pixel point belongs, and further form a plurality of connected domains according to the pixel points that are cascaded with each other, where the pixel points belonging to the same connected domain are cascaded with each other.
In an embodiment of the present invention, the forming module 330 is further adapted to perform binarization processing on the feature map, and form a plurality of connected domains of the pixel points in the binarized image according to the extracted position information of the pixel points.
In an embodiment of the present invention, the merging module 340 is further adapted to merge the pixel points in the same connected domain, locate the position information of the corresponding text box according to the position information of the merged pixel points, and determine the text region on the to-be-detected picture according to the position information of the text box.
Embodiments of the present invention further provide a computer storage medium storing computer program code, which, when run on a computing device, causes the computing device to perform the method for text detection based on pixel segmentation and merging in any of the above embodiments.
An embodiment of the present invention further provides a computing device, including: a processor; a memory storing computer program code; the computer program code, when executed by a processor, causes a computing device to perform a method of text detection based on pixel segmentation merging as in any of the embodiments above.
According to any one or a combination of the above preferred embodiments, the following advantages can be achieved by the embodiments of the present invention:
in the embodiment of the invention, the characteristic information of the picture to be detected is extracted, the characteristic image corresponding to the picture to be detected is generated according to the extracted characteristic information, then the characteristic image is subjected to pixel segmentation to obtain a plurality of pixel points, the confidence score of each pixel point belonging to a character pixel point is analyzed, the position information of the pixel point in the preset confidence score range in the character frame to which the pixel point belongs is extracted, a plurality of connected domains of the pixel point are formed according to the position information of the extracted pixel point, the pixel points in the same connected domain are combined, and the character region on the picture to be detected is determined by utilizing the combined pixel point. Therefore, the embodiment of the invention carries out pixel segmentation on the characteristic graph of the picture to be detected, then forms a plurality of connected domains according to the position information of the character pixel points, merges the pixel points in each connected domain, and each connected domain corresponds to one character, thereby effectively and accurately determining the character region on the picture to be detected according to the merged pixel points. Furthermore, the character region is determined by adopting the connected domain formed among the pixel points, so that the characters, long texts and the like which are rotated on the image to be detected can be effectively detected.
It is clear to those skilled in the art that the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and for the sake of brevity, further description is omitted here.
In addition, the functional units in the embodiments of the present invention may be physically independent of each other, two or more functional units may be integrated together, or all the functional units may be integrated in one processing unit. The integrated functional units may be implemented in the form of hardware, or in the form of software or firmware.
Those of ordinary skill in the art will understand that: the integrated functional units, if implemented in software and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computing device (e.g., a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention when the instructions are executed. And the aforementioned storage medium includes: u disk, removable hard disk, Read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disk, and other various media capable of storing program code.
Alternatively, all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (such as a computing device, e.g., a personal computer, a server, or a network device) associated with program instructions, which may be stored in a computer-readable storage medium, and when the program instructions are executed by a processor of the computing device, the computing device executes all or part of the steps of the method according to the embodiments of the present invention.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments can be modified or some or all of the technical features can be equivalently replaced within the spirit and principle of the present invention; such modifications or substitutions do not depart from the scope of the present invention.
The embodiment of the invention provides a1 character detection method based on pixel segmentation and merging, which comprises the following steps:
extracting the characteristic information of a picture to be detected, and generating a characteristic diagram corresponding to the picture to be detected according to the extracted characteristic information;
carrying out pixel segmentation on the characteristic image to obtain a plurality of pixel points, and analyzing confidence scores of the pixel points belonging to the character pixel points;
extracting the position information of the pixel points within the preset confidence score range in the text box to which the pixel points belong, and forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points;
and merging the pixel points in the same connected domain, and determining the character region on the picture to be detected by using the merged pixel points.
A2, the method according to A1, wherein the extracting the feature information of the picture to be detected, and generating the feature map corresponding to the picture to be detected according to the extracted feature information includes:
extracting the characteristic information of the picture to be detected based on a UNet network structure deep learning model;
and performing up-sampling, down-sampling and corresponding convolution operation on the picture to be detected according to the extracted feature information to obtain a feature map corresponding to the picture to be detected.
A3, the method according to A1 or A2, wherein,
the higher the confidence score of the pixel point belonging to the character pixel point is, the higher the probability of the pixel point belonging to the character pixel point is and the lower the probability of the pixel point belonging to the picture background is;
the smaller the confidence score of the pixel point belonging to the character pixel point is, the smaller the probability of the pixel point belonging to the character pixel point is and the larger the probability of the pixel point belonging to the picture background is.
A4, the method according to A1 or A2, wherein extracting the position information of the pixel points in the preset confidence score range in the text box to which the pixel points belong comprises:
obtaining pixel points with confidence scores larger than preset scores from a plurality of pixel points obtained by segmentation;
and extracting the coordinate value of the pixel point with the value larger than the preset value in the character frame to which the pixel point belongs.
A5, the method according to A4, wherein the forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points comprises:
judging whether any pixel point is cascaded with the pixel point in the appointed direction in the character frame to which the pixel point belongs according to the extracted coordinate value of the any pixel point;
if yes, the arbitrary pixel point and the pixel point in the appointed direction in the text frame to which the arbitrary pixel point belongs are cascaded;
and forming a plurality of connected domains according to the mutually cascaded pixel points, wherein the pixel points belonging to the same connected domain are mutually cascaded.
A6, the method according to A1 or A2, wherein the forming a plurality of connected domains of pixel points according to the extracted position information of the pixel points comprises:
and carrying out binarization processing on the feature map, and forming a plurality of connected domains of the pixel points in the image after binarization processing according to the extracted position information of the pixel points.
A7, the method according to A1 or A2, wherein merging the pixels in the same connected domain, and determining the text region on the picture to be detected by using the merged pixels comprises:
merging the pixel points in the same connected domain, and positioning the position information of the corresponding text frame according to the position information of the merged pixel points;
and determining the character area on the picture to be detected according to the position information of the character frame.
B8, a character detecting device based on pixel segmentation and merging, comprising:
the generating module is suitable for extracting the characteristic information of the picture to be detected and generating a characteristic graph corresponding to the picture to be detected according to the extracted characteristic information;
the analysis module is suitable for carrying out pixel segmentation on the characteristic image to obtain a plurality of pixel points and analyzing the confidence score of each pixel point belonging to a character pixel point;
the forming module is suitable for extracting the position information of the pixel points within the preset confidence score value range in the text box to which the pixel points belong, and forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points;
and the merging module is suitable for merging the pixel points in the same connected domain, and determining the character region on the picture to be detected by using the merged pixel points.
B9, the apparatus of B8, wherein the generating module is further adapted to:
extracting the characteristic information of the picture to be detected based on a UNet network structure deep learning model;
and performing up-sampling, down-sampling and corresponding convolution operation on the picture to be detected according to the extracted feature information to obtain a feature map corresponding to the picture to be detected.
B10, the device according to B8 or B9, wherein,
the higher the confidence score of the pixel point belonging to the character pixel point is, the higher the probability of the pixel point belonging to the character pixel point is and the lower the probability of the pixel point belonging to the picture background is;
the smaller the confidence score of the pixel point belonging to the character pixel point is, the smaller the probability of the pixel point belonging to the character pixel point is and the larger the probability of the pixel point belonging to the picture background is.
B11, the apparatus according to B8 or B9, wherein the formation module comprises an acquisition unit and an extraction unit,
the acquisition unit is suitable for acquiring pixel points with confidence scores larger than preset scores from the plurality of pixel points obtained by segmentation;
the extraction unit is suitable for extracting the coordinate value of the pixel point with the value larger than the preset value in the character frame to which the pixel point belongs.
B12, the apparatus of B11, wherein the forming module further comprises:
the judging unit is suitable for judging whether any pixel point is cascaded with the pixel point in the appointed direction in the text box to which the pixel point belongs according to the extracted coordinate value of the any pixel point;
the forming unit is suitable for cascading the any pixel point and the pixel point in the appointed direction in the text box to which the any pixel point belongs if the judging unit determines that the any pixel point and the pixel point in the appointed direction in the text box to which the any pixel point belongs are cascaded;
and forming a plurality of connected domains according to the mutually cascaded pixel points, wherein the pixel points belonging to the same connected domain are mutually cascaded.
B13, the device according to B8 or B9, wherein the forming module is further adapted to:
and carrying out binarization processing on the feature map, and forming a plurality of connected domains of the pixel points in the image after binarization processing according to the extracted position information of the pixel points.
B14, the apparatus according to B8 or B9, wherein the merging module is further adapted to:
merging the pixel points in the same connected domain, and positioning the position information of the corresponding text frame according to the position information of the merged pixel points;
and determining the character area on the picture to be detected according to the position information of the character frame.
C15, a computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform the method for pixel segmentation merging based text detection of any one of a1-a 7.
C16, a computing device, comprising: a processor; a memory storing computer program code; the computer program code, when executed by the processor, causes the computing device to perform the method for pixel segmentation and merging based text detection of any one of A1-A7.

Claims (10)

1. A character detection method based on pixel segmentation and merging comprises the following steps:
extracting the characteristic information of a picture to be detected, and generating a characteristic diagram corresponding to the picture to be detected according to the extracted characteristic information;
carrying out pixel segmentation on the characteristic image to obtain a plurality of pixel points, and analyzing confidence scores of the pixel points belonging to the character pixel points;
extracting the position information of the pixel points within the preset confidence score range in the text box to which the pixel points belong, and forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points;
and merging the pixel points in the same connected domain, and determining the character region on the picture to be detected by using the merged pixel points.
2. The method according to claim 1, wherein the extracting feature information of the picture to be detected and generating the feature map corresponding to the picture to be detected according to the extracted feature information includes:
extracting the characteristic information of the picture to be detected based on a UNet network structure deep learning model;
and performing up-sampling, down-sampling and corresponding convolution operation on the picture to be detected according to the extracted feature information to obtain a feature map corresponding to the picture to be detected.
3. The method of claim 1 or 2,
the higher the confidence score of the pixel point belonging to the character pixel point is, the higher the probability of the pixel point belonging to the character pixel point is and the lower the probability of the pixel point belonging to the picture background is;
the smaller the confidence score of the pixel point belonging to the character pixel point is, the smaller the probability of the pixel point belonging to the character pixel point is and the larger the probability of the pixel point belonging to the picture background is.
4. The method according to claim 1 or 2, wherein extracting the position information of the pixel points in the preset confidence score range in the text box to which the pixel points belong comprises:
obtaining pixel points with confidence scores larger than preset scores from a plurality of pixel points obtained by segmentation;
and extracting the coordinate value of the pixel point with the value larger than the preset value in the character frame to which the pixel point belongs.
5. The method of claim 4, wherein the forming of the plurality of connected domains of the pixel points according to the extracted position information of the pixel points comprises:
judging whether any pixel point is cascaded with the pixel point in the appointed direction in the character frame to which the pixel point belongs according to the extracted coordinate value of the any pixel point;
if yes, the arbitrary pixel point and the pixel point in the appointed direction in the text frame to which the arbitrary pixel point belongs are cascaded;
and forming a plurality of connected domains according to the mutually cascaded pixel points, wherein the pixel points belonging to the same connected domain are mutually cascaded.
6. The method according to claim 1 or 2, wherein the forming of the plurality of connected domains of the pixel point according to the extracted position information of the pixel point comprises:
and carrying out binarization processing on the feature map, and forming a plurality of connected domains of the pixel points in the image after binarization processing according to the extracted position information of the pixel points.
7. The method according to claim 1 or 2, wherein merging the pixels in the same connected domain, and determining the text region on the picture to be detected by using the merged pixels comprises:
merging the pixel points in the same connected domain, and positioning the position information of the corresponding text frame according to the position information of the merged pixel points;
and determining the character area on the picture to be detected according to the position information of the character frame.
8. A text detection device based on pixel segmentation and merging, comprising:
the generating module is suitable for extracting the characteristic information of the picture to be detected and generating a characteristic graph corresponding to the picture to be detected according to the extracted characteristic information;
the analysis module is suitable for carrying out pixel segmentation on the characteristic image to obtain a plurality of pixel points and analyzing the confidence score of each pixel point belonging to a character pixel point;
the forming module is suitable for extracting the position information of the pixel points within the preset confidence score value range in the text box to which the pixel points belong, and forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points;
and the merging module is suitable for merging the pixel points in the same connected domain, and determining the character region on the picture to be detected by using the merged pixel points.
9. A computer storage medium having computer program code stored thereon which, when run on a computing device, causes the computing device to perform the method of pixel segmentation merging based text detection of any one of claims 1-7.
10. A computing device, comprising: a processor; a memory storing computer program code; the computer program code, when executed by the processor, causes the computing device to perform the method for pixel segmentation merging based text detection of any one of claims 1-7.
CN201910114195.5A 2019-02-14 2019-02-14 Character detection method and device based on pixel segmentation and merging Pending CN111563505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910114195.5A CN111563505A (en) 2019-02-14 2019-02-14 Character detection method and device based on pixel segmentation and merging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910114195.5A CN111563505A (en) 2019-02-14 2019-02-14 Character detection method and device based on pixel segmentation and merging

Publications (1)

Publication Number Publication Date
CN111563505A true CN111563505A (en) 2020-08-21

Family

ID=72071317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910114195.5A Pending CN111563505A (en) 2019-02-14 2019-02-14 Character detection method and device based on pixel segmentation and merging

Country Status (1)

Country Link
CN (1) CN111563505A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085022A (en) * 2020-09-09 2020-12-15 上海蜜度信息技术有限公司 Method, system and equipment for recognizing characters
CN112101347A (en) * 2020-08-27 2020-12-18 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN112364863A (en) * 2020-10-20 2021-02-12 苏宁金融科技(南京)有限公司 Character positioning method and system for license document
CN112580655A (en) * 2020-12-25 2021-03-30 特赞(上海)信息科技有限公司 Text detection method and device based on improved CRAFT

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090003700A1 (en) * 2007-06-27 2009-01-01 Jing Xiao Precise Identification of Text Pixels from Scanned Document Images
CN103593329A (en) * 2012-08-17 2014-02-19 腾讯科技(深圳)有限公司 Text image rearrangement method and system
CN103679168A (en) * 2012-08-30 2014-03-26 北京百度网讯科技有限公司 Detection method and detection device for character region
JP2014085841A (en) * 2012-10-24 2014-05-12 Glory Ltd Character segmentation device, character segmentation method, and character recognition device
CN104794504A (en) * 2015-04-28 2015-07-22 浙江大学 Graphic pattern text detection method based on deep learning
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN105868759A (en) * 2015-01-22 2016-08-17 阿里巴巴集团控股有限公司 Method and apparatus for segmenting image characters
CN107784301A (en) * 2016-08-31 2018-03-09 百度在线网络技术(北京)有限公司 Method and apparatus for identifying character area in image
CN108460385A (en) * 2018-03-02 2018-08-28 山东超越数控电子股份有限公司 A kind of Document Segmentation method and apparatus
CN108764240A (en) * 2018-03-28 2018-11-06 中科博宏(北京)科技有限公司 Computer vision identity card Character segmentation identification technology based on character relative size
KR101937398B1 (en) * 2017-10-20 2019-01-10 김학선 System and method for extracting character in image data of old document

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090003700A1 (en) * 2007-06-27 2009-01-01 Jing Xiao Precise Identification of Text Pixels from Scanned Document Images
CN103593329A (en) * 2012-08-17 2014-02-19 腾讯科技(深圳)有限公司 Text image rearrangement method and system
CN103679168A (en) * 2012-08-30 2014-03-26 北京百度网讯科技有限公司 Detection method and detection device for character region
JP2014085841A (en) * 2012-10-24 2014-05-12 Glory Ltd Character segmentation device, character segmentation method, and character recognition device
CN105868759A (en) * 2015-01-22 2016-08-17 阿里巴巴集团控股有限公司 Method and apparatus for segmenting image characters
CN104794504A (en) * 2015-04-28 2015-07-22 浙江大学 Graphic pattern text detection method based on deep learning
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN107784301A (en) * 2016-08-31 2018-03-09 百度在线网络技术(北京)有限公司 Method and apparatus for identifying character area in image
KR101937398B1 (en) * 2017-10-20 2019-01-10 김학선 System and method for extracting character in image data of old document
CN108460385A (en) * 2018-03-02 2018-08-28 山东超越数控电子股份有限公司 A kind of Document Segmentation method and apparatus
CN108764240A (en) * 2018-03-28 2018-11-06 中科博宏(北京)科技有限公司 Computer vision identity card Character segmentation identification technology based on character relative size

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101347A (en) * 2020-08-27 2020-12-18 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN112101347B (en) * 2020-08-27 2021-04-30 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN112085022A (en) * 2020-09-09 2020-12-15 上海蜜度信息技术有限公司 Method, system and equipment for recognizing characters
CN112085022B (en) * 2020-09-09 2024-02-13 上海蜜度科技股份有限公司 Method, system and equipment for recognizing characters
CN112364863A (en) * 2020-10-20 2021-02-12 苏宁金融科技(南京)有限公司 Character positioning method and system for license document
CN112580655A (en) * 2020-12-25 2021-03-30 特赞(上海)信息科技有限公司 Text detection method and device based on improved CRAFT

Similar Documents

Publication Publication Date Title
WO2018103608A1 (en) Text detection method, device and storage medium
CN108108731B (en) Text detection method and device based on synthetic data
CN105868758B (en) method and device for detecting text area in image and electronic equipment
CN111563505A (en) Character detection method and device based on pixel segmentation and merging
CN110032998B (en) Method, system, device and storage medium for detecting characters of natural scene picture
CN109918987B (en) Video subtitle keyword identification method and device
JP5775225B2 (en) Text detection using multi-layer connected components with histograms
KR20160132842A (en) Detecting and extracting image document components to create flow document
JP6075190B2 (en) Image processing method and apparatus
CN111259878A (en) Method and equipment for detecting text
JP7026165B2 (en) Text recognition method and text recognition device, electronic equipment, storage medium
CN112101386B (en) Text detection method, device, computer equipment and storage medium
CN112070649B (en) Method and system for removing specific character string watermark
JP2006067585A (en) Method and apparatus for specifying position of caption in digital image and extracting thereof
CN112949455B (en) Value-added tax invoice recognition system and method
CN113591746B (en) Document table structure detection method and device
CN110570442A (en) Contour detection method under complex background, terminal device and storage medium
CN111652144A (en) Topic segmentation method, device, equipment and medium based on target region fusion
CN114981838A (en) Object detection device, object detection method, and object detection program
JP2019220014A (en) Image analyzing apparatus, image analyzing method and program
CN118430005A (en) Transmission tower drawing information extraction method, device and equipment
CN111160368A (en) Method, device and equipment for detecting target in image and storage medium
CN114355467A (en) Detection model establishment and multi-image-based track foreign matter detection method
JP2021051581A (en) Similar region detecting apparatus, similar region detecting method, and program
CN117541546A (en) Method and device for determining image cropping effect, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination