CN110717489B - Method, device and storage medium for identifying text region of OSD (on Screen display) - Google Patents

Method, device and storage medium for identifying text region of OSD (on Screen display) Download PDF

Info

Publication number
CN110717489B
CN110717489B CN201910885665.8A CN201910885665A CN110717489B CN 110717489 B CN110717489 B CN 110717489B CN 201910885665 A CN201910885665 A CN 201910885665A CN 110717489 B CN110717489 B CN 110717489B
Authority
CN
China
Prior art keywords
bounding boxes
osd
text region
bounding
bounding box
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910885665.8A
Other languages
Chinese (zh)
Other versions
CN110717489A (en
Inventor
郭玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910885665.8A priority Critical patent/CN110717489B/en
Priority to PCT/CN2019/118284 priority patent/WO2021051604A1/en
Publication of CN110717489A publication Critical patent/CN110717489A/en
Application granted granted Critical
Publication of CN110717489B publication Critical patent/CN110717489B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The application provides a method for identifying text regions of an OSD (on Screen display), which relates to the technical field of image identification, and comprises the following steps: preprocessing an OSD file to obtain a frame-by-frame image, performing binarization threshold filtering on the frame-by-frame image by using a canny algorithm, performing frame contour acquisition on the filtered image, and finding out discrete points of the contour of a text region; packaging the discrete points into a plurality of polygonal bounding boxes, and screening the bounding boxes according to preset conditions; amplifying the multiple bounding boxes obtained by screening in proportion through an expansion algorithm; the remaining bounding boxes after communication are taken as a union set according to preset conditions; and calculating the rectangular range of the bounding box after union according to the width regression of the character area fonts, and selecting the bounding box with the smallest rectangular range, wherein the area corresponding to the bounding box is the character area to be identified. The application can realize the effect of shielding the interference of the illumination dynamic change on the detection by using the segmentation method based on the edge detection to segment the OSD text and obtain the segmentation area.

Description

Method, device and storage medium for identifying text region of OSD (on Screen display)
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to a method, an apparatus, and a storage medium for recognizing text regions of OSD.
Background
At present, OSD (OSD) display technology, OSD (On Screen Display, on-screen menu) is widely used in the market, and is applied to CRT/LCD displays, to generate special fonts or graphics in the screen of the display, so that the user can obtain some information. But the text and the video are nested in each frame, and the text embedded in the image is an important expression mode of the semantic content of the image. If the characters can be automatically extracted and identified, the machine can automatically understand the content of the pictures and classify the pictures, and the pictures are marked and searched by adopting the characters by means of the mature text search technology, so that a way is provided for content-based image and video search. How to find the text range from the text reversely is a problem to be solved.
The existing image and text segmentation technology is mainly divided into the following three types: threshold-based methods, cluster-based methods, and statistical model-based methods.
The above-mentioned various character segmentation methods only utilize the local gray level or color information of the image bottom layer, and do not consider the space or global context information of the characters. The interference prospect of the font area cannot be removed, and the interference of the dynamic change of the illumination on the detection cannot be shielded.
In view of the above, there is a need for a text region recognition scheme for OSD that eliminates the font region interference.
Disclosure of Invention
The application provides a method for identifying a text region of an OSD, an electronic device and a computer readable storage medium, which mainly divide the text region of the OSD by a segmentation method based on edge detection and acquire the segmentation region, lock the OSD text segmentation region by self-adaptive binarization processing after shielding interference, and finish text extraction.
In order to achieve the above object, the present application further provides a method for identifying text regions of an OSD, applied to an electronic device, the method comprising: s110, preprocessing an OSD file to obtain a frame-by-frame image, performing binarization threshold filtering on the frame-by-frame image by using a canny algorithm, performing frame contour acquisition on the filtered image, and finding out discrete points of the contour of a text region; s120, packaging the discrete points into a plurality of polygonal bounding boxes, calculating the areas of the bounding boxes, and screening the bounding boxes according to preset conditions; s130, magnifying the screened bounding boxes proportionally by an expansion algorithm to realize the mutual communication of the outlines among the bounding boxes; s140, after the outlines of the bounding boxes are communicated, taking a union set of the remaining bounding boxes after the communication according to preset conditions; s150, calculating the rectangular range of the bounding box after union according to the width regression of the character area fonts, and selecting the bounding box with the smallest rectangular range, wherein the area corresponding to the bounding box is the character area to be identified.
Preferably, in step S110, discrete points of the outline of the text region are obtained by findcontius.
Preferably, after the step S130, if the contour sharpness of the text region does not reach the preset threshold, steps S120 and S130 are repeated until the contour sharpness of the text region reaches the preset threshold.
Preferably, the method further comprises the step of screening the bounding boxes by setting the size and concentration parameters of the corrosion particles, and the step of mutually communicating the outlines among the bounding boxes by setting the outline layer number parameters of the communicating area.
Preferably, in step S120, the step of encapsulating the discrete points into a plurality of polygonal bounding boxes includes: and obtaining a plurality of convex hulls from the discrete points through convexhall, and packaging the obtained convex hulls one by one into a polygonal bounding box.
To achieve the above object, the present application provides an electronic device including: the memory stores a recognition program of the text region of the OSD, and the recognition program of the text region of the OSD realizes the following steps when being executed by the processor: s110, preprocessing an OSD file to obtain a frame-by-frame image, performing binarization threshold filtering on the frame-by-frame image by using a canny algorithm, performing frame contour acquisition on the filtered image, and finding out discrete points of the contour of a text region; s120, packaging the discrete points into a plurality of polygonal bounding boxes, calculating the areas of the bounding boxes, and screening the bounding boxes according to preset conditions; s130, magnifying the screened bounding boxes proportionally by an expansion algorithm to realize the mutual communication of the outlines among the bounding boxes; s140, after the outlines of the bounding boxes are communicated, taking a union set of the remaining bounding boxes after the communication according to preset conditions; s150, calculating the rectangular range of the bounding box after union according to the width regression of the character area fonts, and selecting the bounding box with the smallest rectangular range, wherein the area corresponding to the bounding box is the character area to be identified.
Preferably, in step S110, discrete points of the outline of the text region are obtained by findcontius. Preferably, after the step S130, if the contour sharpness of the text region does not reach the preset threshold, steps S120 and S130 are repeated until the contour sharpness of the text region reaches the preset threshold. Preferably, the method further comprises the step of screening the bounding boxes by setting the size and concentration parameters of the corrosion particles, and the step of mutually communicating the outlines among the bounding boxes by setting the outline layer number parameters of the communicating area.
In addition, in order to achieve the above object, the present application also provides a computer-readable storage medium storing a computer program including a program for identifying a text region of an OSD, which when executed by a processor, implements the steps of the above method for identifying a text region of an OSD.
According to the method for identifying the text region of the OSD, the electronic device and the computer readable storage medium, the text region of the OSD is segmented and obtained by the segmentation method based on edge detection, and after interference is shielded, the text region of the OSD is locked by the self-adaptive binarization processing, so that the text extraction is completed. The beneficial effects are as follows:
1. the effect of eliminating the interference prospect of the font area is realized by adopting a moving body detection method;
2. by using a segmentation method based on edge detection, OSD characters are segmented out and segmented areas are obtained, so that interference of illumination dynamic change on detection is shielded;
3. the clear recognition of the Chinese in the OSD can be realized.
Drawings
FIG. 1 is a flowchart of a text region identification method for OSD according to a preferred embodiment of the application;
FIG. 2 is a schematic diagram of an electronic device according to a preferred embodiment of the present application;
the achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The application provides a method for identifying text regions of an OSD. Referring to fig. 1, a flowchart of a method for identifying text regions of OSD according to a preferred embodiment of the present application is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.
The application divides the Chinese character region in the OSD by a dividing method based on edge detection, acquires the divided region, locks the OSD character divided region by self-adaptive binarization processing after shielding interference, and completes character extraction.
In addition, the image segmentation method based on the edge adopted in the application is a set of continuous pixel points on boundary lines of two different areas in an image, is reflection of image local feature discontinuity, and shows abrupt changes of image characteristics such as gray scale, color, texture and the like. In general, an edge-based segmentation method refers to edge detection based on gray values, which is a method based on the observation that the gray values of the edges may exhibit a step-type or a roof-type change. The gray values of the pixel points at the two sides of the step-type edge have obvious difference, and the roof-type edge is positioned at the turning point where the gray value rises or falls. It is based on this property that edge detection can be performed using differential operators, i.e. the edges are determined using the extreme values of the first derivative and the zero crossings of the second derivative, and in particular implementations can be performed using the convolution of the image with the template.
In this embodiment, the method for identifying text regions of OSD includes: step S110 to step S150.
S110, preprocessing an OSD file to obtain a frame-by-frame image, performing binarization threshold filtering on the frame-by-frame image by using a canny algorithm, performing frame contour acquisition on the filtered image, and finding out discrete points of the contour of a text region; thus, the effect of preliminarily deleting the area with darker light is realized.
Wherein, the frame-by-frame image is obtained by decoding the OSD image in real time.
Completing the task of edge detection through a canny algorithm; in image edge detection, noise suppression and accurate edge positioning cannot be simultaneously satisfied, and some edge detection algorithms remove noise through smooth filtering and increase uncertainty of edge positioning; and the sensitivity of the edge detection operator to the edge is improved, and the sensitivity to noise is also improved. The canny algorithm strives to find the best compromise between noise immunity and accurate positioning. The Canny algorithm has the characteristics of high signal-to-noise ratio, high positioning accuracy and high single-edge response.
In other words, the common canny algorithm blurs the image P and then convolves it with a bank of orthogonal differential filters (e.g., prewitt filters) to produce images H and V that include derivatives in the horizontal and vertical directions, respectively, with the gradient direction and magnitude calculated for pixel (i, j). If the amplitude exceeds a threshold, an edge is assigned (referred to herein as thresholding, but does not work well).
In the application, the binarization of the image is realized through double threshold values, namely, the gray value of the pixel point on the image is set to 0 or 255, namely, the whole image is obviously provided with only black and white visual effects. An image includes a target object, a background and noise, and in order to directly extract the target object from a multi-valued digital image, a common method is to set a threshold T, and divide the data of the image into two parts by using T: a group of pixels greater than T and a group of pixels less than T. This is the most special method to study gray scale transformation, called Binarization (Binarization) of the image. The adaptive threshold does not need a fixed threshold, but can be set according to a corresponding adaptive method through local feature adaptation of the image, so as to perform binarization processing. The application locks the OSD text segmentation area through the self-adaptive binarization processing.
The adaptive thresholding is performed, TH and TL; the two problems of rich edges of a low threshold value and edge deletion at a high threshold value can be considered; the high threshold is used to detect important, significant lines, contours, etc. in the image, while the low threshold is used to ensure that details are not lost, and the low threshold detects more abundant edges, but many are not of interest. Pursuing a single edge response criterion; and finally, adopting a search algorithm to reserve the lines overlapping with the edges of the high threshold value in the low threshold value, and deleting other lines. The high threshold is used for judging the position of the true edge, and the low threshold is used for repairing the judged true edge. The high threshold value is processed to obtain a high threshold value image, a true edge endpoint in the high threshold value image is searched, and edges near the true edge endpoint are patched based on the low threshold value.
In a specific embodiment, the step of performing threshold filtering on the frame profile by using a canny algorithm includes: s111, carrying out convolution smoothing on the image by using a Gaussian filter, and preliminarily deleting a part of areas with darker light to obtain a processed image guass; s112, calculating the amplitude and the direction of the gradient by using the finite difference of the first-order partial derivatives; s113, performing non-maximum suppression on the gradient amplitude, namely traversing the image, and setting the pixel value to be 0 if the gray value of a certain pixel is not the maximum compared with the gray values of the front pixel and the rear pixel in the gradient direction, namely not the edge; s114, detecting and connecting edges by a double-threshold algorithm.
Here, the threshold is for the gradient. The high threshold Canny algorithm is used to take the maximum gradient profile as an edge decision. That is, even if two thresholds are calculated using the cumulative histogram, any is necessarily an edge that is greater than the high threshold; where it is not necessarily an edge that is less than the low threshold. Finally, the MyResult array is obtained, wherein the MyResult array only contains image data of 0 or 255, and the image data is directly assigned to the image object and returned.
The illumination variation is divided into abrupt and gradual changes of light. The background model is capable of adapting to gradual changes in light in the outdoor environment during daytime; accordingly, the background model can also adapt to the indoor environment of the suddenly-opened light, and in short, the change of the light will strongly influence the background model, which is very likely to cause false detection. The segmentation method based on edge detection is used for segmenting the OSD characters and obtaining segmented areas, and meanwhile interference of illumination dynamic change on detection is shielded. The methods of edge detection of Roberts operator, sobel operator and Kirsh operator have poor image segmentation effect due to factors such as brightness non-uniformity.
Because the noise effect formed by illumination is not good, the histogram equalization of the canny algorithm is used for removing illumination influence and a double-threshold algorithm is used for realizing edge detection, so that the technical effect of shielding interference of illumination dynamic change on detection is achieved.
The binarized image is obtained by step S110. And searching the outline of the binarized image. The outline is a set of points (a set of discrete points) that are parameters that make up the convex hull.
In a specific embodiment, discrete points of the outline of the text region are obtained by findcontius. Finding discrete points (i.e., discrete pixel points) of the contour in the acquired frame contour by findcontius; the smaller object has only one layer of contour, the larger object has multiple layers of contours, and the multiple layers of contours are not communicated.
Contours and hierarchies are defined first and then contours are found.
Calculating the outline of the image by utilizing a findcontius function, and traversing each point in the outline; that is, contours of all feature points in the image are retrieved using the findcontius function, each contour being a polygon, resulting in a set of contours of feature points. The contour searching is carried out on the binarized image obtained after the edge detection of the canny algorithm. The contours are retrieved from the binary image and the number of contours detected is returned. So as to realize the preliminary extraction of the boundary of the text region.
The contour group number contours parameter is multi-layer; the method is used in a background computing area, and is multi-layer, so that the method is multi-use; is a vector and is a double vector, each element in the vector holds a set of points consisting of consecutive points, each set of points being a contour. There are how many contours and how many elements the vector conductors has.
The function of extracting the target contour is findContours, the input image of the target contour is a binary image, and the output is a set of contour points of each communication area: vector < vector < Point >. The size of the outer vector represents the number of contours in the image and the size of the inner vector represents the number of points on the contours.
The function of extracting the target contour is findContours with three parameters, the first is the input image; the second is to find the contour function to perform the contour search mode, which illustrates the type of finding the contour, here we use the outer contour, and also find all the contours, i.e. the part including some holes, like the contour formed between the arm and the waist of the image person. The third parameter describes the contour representing method, i.e. the contour approximation method, and the parameters in the program describe the contour including all points, or other parameters can be used to make the position points of the start point and the end point of the straight line be saved.
The return values are three, the first being an image, the second being a contour, the third being a tomographic structure of the contour. Wherein the second return value of the contour is a Python list in which all contours in the image are stored. Each contour is a Numpy array containing the coordinates of object boundary points (x, y).
S120, packaging the discrete points into a plurality of polygonal bounding boxes, calculating the areas of the bounding boxes, and screening the bounding boxes according to preset conditions;
in a specific embodiment, the discrete points acquire a plurality of convex hulls through convexhall, and the acquired convex hulls are packaged into polygonal bounding boxes one by one.
Because the convex hull is an irregular pattern, there is a need to use a simple rectangle for secondary packaging, i.e. forming a polygonal bounding box.
It should be noted that, in the initial stage, a plurality of bounding boxes are formed, because noise is inevitably present in the picture, the noise is also found as discrete points, but the noise has a smaller area than the real text region.
S121, searching a convex hull function by utilizing a convexhull;
convex Hull (Convex Hull) is a concept in computational geometry (graphics) in which, for a given set X, the intersection S of all Convex sets containing X is called the Convex Hull of X in a real vector space V. The convex hull of X may be constructed with a linear combination of all points (X1, X2 … xn) within X. opencv provides a convexHull () function to find the convex hull of an object in an image. I.e., a convex polygon of minimal area surrounding a given point inside. Alternatively, points on a given plane are found out and a minimum set of points is connected into a convex polygon, so that the given points are all within or on the polygon, and the convex polygon is a two-dimensional convex hull of the given points.
The convexHull () function contains four parameters, the first parameter representing the input two-dimensional point set, mat type data; the second parameter, output parameter, is used for finding the convex hull after the output function call; a third parameter, representing the operation direction, when the identifier is true, the output convex hull is clockwise, otherwise, counterclockwise; and the fourth parameter represents the operation identifier, the default value is true, at the moment, all points of each convex hull are returned, otherwise, indexes of all points of the convex hull are returned, and when the output array is std:: vector, the identifier is ignored.
Algorithms for forming convex hulls include Graham's Scan method and Jarvis stepping method.
S122, forming a polygonal bounding box by using the obtained convex hull; the bounding box may be an AABB bounding box, a bounding sphere, a direction bounding box OBB, a fixed direction convex hull FDH, etc.
Wherein a bounding box is an algorithm for solving the optimal bounding space of a discrete point set, and the basic idea is to approximately replace complex geometric objects with a geometrical body (called a bounding box) which is slightly larger in volume and simple in characteristics. A bounding box is a simple geometric space that contains objects of complex shape. The smallest polygon constructed to contain all sides in the two-dimensional convex hull is a bounding box. A plurality of bounding boxes are constructed simultaneously, and the one with the smallest area is selected.
The most common bounding box algorithms are the AABB bounding box (Axis-aligned bounding box), the bounding Sphere (Sphere), the direction bounding box OBB (Oriented bounding box) and the fixed direction convex hull FDH (Fixed directions hulls or k-DOP).
In the fields of computer graphics and computational geometry, a bounding box of a set of objects is a closed space that completely encloses a combination of objects. The efficiency of geometric operations can be improved by encapsulating complex objects in simple bounding boxes, and approximating the shape of the complex geometric body with a simple bounding box shape. And generally simple objects are relatively easy to inspect for overlap with each other.
S123, obtaining three-dimensional information of the bounding boxes, and obtaining a center point of each bounding box according to the three-dimensional information of the bounding box;
s124, estimating the area and the point of the bounding box. The solving of the bounding box area comprises three types of O' Rourke algorithm, projection rotation method and principal component analysis method.
The bounding box is two-dimensional, and the area of the bounding box can be estimated by the width x height. The points of the bounding box, because the bounding box is rectangular, including 4 edge points, can also be translated into a center point and an offset up and left.
It should be noted that, screening the bounding box according to the preset condition means deleting the bounding box with a smaller area. In a specific embodiment, some disturbing object areas with smaller areas are filtered and deleted, and deletion of bounding boxes with smaller areas is achieved by setting corrosion particle size and concentration parameters. That is, in the sorting manner, the smaller inner region may be deleted in the super-parametric manner, and the bounding box region having the smaller area may be defined as the interfering object region. Therefore, smaller inner regions need to be deleted, what we need is the region with the largest outer envelope.
S130, magnifying the screened bounding boxes proportionally by an expansion algorithm to realize the mutual communication of the outlines among the bounding boxes.
It should be noted that, because the characters are continuous, but there is a difference between the characters, some characters are complex, and the features can be easily found; however, some characters are simple and their features are not easy to find. Thus, there is a need to enlarge the possibility of locating simple characters by artificial etching (character-based continuous), thereby achieving the effect of finding simple characters along with complex characters.
Scaling up the discrete polygonal bounding boxes using erosion (Erode) and expansion (dialite), communicating them; that is, the scale () function performs a dilation operation on an input image with a specific structural element. In a specific embodiment, the expansion operation of the disc () function is mainly used for communication; because of gaps among the words, the words are connected into a whole through expansion operation, and thus, the outline extraction is convenient. Expansion Dilate uses 3 x 3 structural elements/templates to perform correlation operations to expand it, expansion supporting structural elements of arbitrary shape. An increase of one pixel (3×3) in object size can be achieved by dilation; smoothing object edges and reducing or filling distances between objects. And small holes formed by difference between images are reduced, so that the regional image is complete.
And the profiles among the multiple bounding boxes are mutually communicated by setting the profile layer number parameters of the communication area.
After the step S130, if the contour sharpness of the text region does not reach the preset threshold, repeating the steps S120 and S130 until the contour sharpness of the text region reaches the preset threshold.
By means of super-parameter setting, because the direction and the position of light in the lens are fixed, different scenes set different super-parameters to shield calculation of some areas. The super parameters include at least three classes: 1. threshold parameters of profile definition; 2. corrosion particle size and concentration parameters; 3. the profile of the communication area, i.e. the profile layer number parameter.
For the area with strong illumination, the light position of the scene is fixed, so that the three parameters are required to be correspondingly set.
Correcting the preliminary edge detection through corrosion and communication parameter setting to obtain an accurate edge detection result; the parameter setting through corrosion and communication is that corresponding parameter setting is carried out aiming at different scenes and different characters (size, color and background), the preliminary positioning is a basic parameter, the preliminary positioning is obtained through rough estimation, and the best effect can be achieved through parameter correction of the scenes and the different characters and finally mixing.
And S140, after the outlines of the bounding boxes are communicated, taking a union set of the remaining bounding boxes after the communication according to preset conditions.
Because denoising and filtering simple characters are both implemented based on bounding boxes, the remaining discrete points are required to be packaged into polygonal bounding boxes.
In a specific embodiment, repeating convexwell acquires the remaining pixels from the bounding box. Because after deleting bounding boxes of smaller area (i.e., deleting noise) and achieving contour communication of bounding boxes by corrosion (i.e., screening simple characters), the remaining discrete pixels need to be generalized to one aggregate.
The term "remaining" is used to mean that the contour points are included in the "bounding box" described above, and these points are seen from both the "point" and the "box", where "remaining" is the corresponding "point" in the remaining "box". That is, the remaining discrete points are points corresponding to the connected bounding boxes.
And carrying out union processing on the rest bounding boxes. The number of bounding boxes is multiple, the multiple bounding boxes are combined, the final form is that the number of the bounding boxes is a plurality of word strings, and finally the number of the bounding boxes is a plurality of, so that the area of the word strings is determined. Because if several discrete bounding boxes are affiliated to a word string, the maximum probabilities are close, and one can be easily integrated, but if the discrete bounding boxes affiliated to different word strings are extremely small, the bounding areas of the discrete bounding boxes are too large and obviously wrong, and the convexhull algorithm internally has automatic brushing.
S150, calculating the rectangular range of the bounding box after union according to the width regression of the character area font, and selecting the bounding box with the smallest rectangular range, wherein the area corresponding to the bounding box is the character area to be identified.
Further carrying out regression calculation on the rectangular range of the bounding box after union according to the width of the font; since the font size and region are determined in the video, separate processing, that is, different parameters can be set separately, is possible. Since starting from a pixel, through image processing, there are errors, such as a rectangle that is too large, too wide, or too small, too narrow, which can be affected by illumination, color, background, etc., but the font is fixed, there will be some comparison error, such as scaling with the width and height, as a reference for regression.
In summary, the application carries out binarization processing on the canny algorithm to detect the outline, finds discrete points of the outline by utilizing findContous, finds convex hulls by utilizing convexhull, forms bounding boxes, calculates the area of the bounding boxes, screens the optimal bounding boxes, and completes character recognition (extraction).
The application provides a method for identifying text regions of an OSD, which is applied to an electronic device 4. Referring to fig. 2, an application environment diagram of a preferred embodiment of the method for identifying text regions of OSD according to the present application is shown.
In this embodiment, the electronic apparatus 1 may be a terminal device having an operation function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.
The electronic device 2 includes: processor 22, memory 21, communication bus 23, and network interface 24.
The memory 21 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, a hard disk, a multimedia card, a card memory 21, etc. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 2, such as a hard disk of the electronic device 2. In other embodiments, the readable storage medium may also be an external memory 21 of the electronic device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 2.
In the present embodiment, the readable storage medium of the memory 21 is generally used to store the recognition program 20 of the text area of the OSD installed in the electronic device 2. The memory 21 may also be used for temporarily storing data that has been output or is to be output.
The processor 22 may in some embodiments be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip for running program code or processing data stored in the memory 21, such as the identification program 20 for executing text fields of OSD, etc.
The communication bus 23 is used to enable connection communication between these components.
The network interface 24 may optionally comprise a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication connection between the electronic apparatus 2 and other electronic devices.
Fig. 2 shows only the electronic device 2 with components 21-24, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.
Optionally, the electronic device 2 may further comprise a user interface, which may comprise an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone or the like with voice recognition function, a voice output device such as a sound box, a headset or the like, and optionally a standard wired interface, a wireless interface.
Optionally, the electronic device 2 may also comprise a display, which may also be referred to as a display screen or display unit. In some embodiments, the display may be an LED display, a liquid crystal display, a touch-control liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic device 2 and for displaying recognized text.
Optionally, the electronic device 2 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, etc., which are not described herein.
In the embodiment of the apparatus shown in fig. 2, the memory 21, which is a computer storage medium, may include an operating system, and a recognition program 20 for text regions of OSD; the processor 22 executes the recognition program 20 of the text region of OSD stored in the memory 21 to perform the steps of: preprocessing an OSD file to obtain a frame-by-frame image, performing binarization threshold filtering on the frame-by-frame image by using a canny algorithm, performing frame contour acquisition on the filtered image, and finding out discrete points of the contour of a text region; packaging the discrete points into a plurality of polygonal bounding boxes, and screening the bounding boxes according to preset conditions; amplifying the multiple bounding boxes obtained by screening in proportion through an expansion algorithm; the remaining bounding boxes after communication are taken as a union set according to preset conditions; and calculating the rectangular range of the bounding box after union according to the width regression of the character area fonts, and selecting the bounding box with the smallest rectangular range, wherein the area corresponding to the bounding box is the character area to be identified.
In other embodiments, the text region recognition program 20 of the OSD may also be divided into one or more modules, one or more modules being stored in the memory 21 and executed by the processor 22 to complete the present application. The application may refer to a series of computer program instruction segments capable of performing a specified function.
The recognition program 20 of the text region of the OSD may be divided into: the device comprises an image preprocessing unit, a bounding box forming unit and a bounding box selecting unit. The image and processing unit comprises preprocessing an OSD file to obtain a frame-by-frame image, performing binarization threshold filtering on the frame-by-frame image by using a canny algorithm, performing frame contour acquisition on the filtered image, and finding out discrete points of the contour of a text region; the bounding box forming unit comprises the steps of packaging the discrete points into a plurality of polygonal bounding boxes, calculating the areas of the plurality of bounding boxes, screening the bounding boxes according to preset conditions, and amplifying the plurality of bounding boxes obtained by screening in proportion through an expansion algorithm to realize mutual communication of outlines among the plurality of bounding boxes; after the outlines of the bounding boxes are communicated, taking a union set of the remaining bounding boxes after the communication according to preset conditions; the bounding box selecting unit comprises a rectangular range of the bounding box after the union of the width regression calculation of the fonts of the text regions, and selects the bounding box with the smallest rectangular range, wherein the region corresponding to the bounding box is the text region to be identified.
In addition, an embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium includes a program for identifying a text region of an OSD, where the program for identifying the text region of the OSD performs the following operations when executed by a processor: preprocessing an OSD file to obtain a frame-by-frame image, performing binarization threshold filtering on the frame-by-frame image by using a canny algorithm, performing frame contour acquisition on the filtered image, and finding out discrete points of the contour of a text region; packaging the discrete points into a plurality of polygonal bounding boxes, and screening the bounding boxes according to preset conditions; amplifying the multiple bounding boxes obtained by screening in proportion through an expansion algorithm; the remaining bounding boxes after communication are taken as a union set according to preset conditions; and calculating the rectangular range of the bounding box after union according to the width regression of the character area fonts, and selecting the bounding box with the smallest rectangular range, wherein the area corresponding to the bounding box is the character area to be identified. The embodiment of the computer readable storage medium of the present application is substantially the same as the above-mentioned method for identifying text regions of OSD and the embodiment of the electronic device, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (7)

1. The method for identifying the text region of the OSD is applied to an electronic device, and is characterized by comprising the following steps:
s110, preprocessing an OSD file to obtain a frame-by-frame image, performing binarization threshold filtering on the frame-by-frame image by using a canny algorithm, performing frame contour acquisition on the filtered image, and finding out discrete points of the contour of a text region;
s120, packaging the discrete points into a plurality of polygonal bounding boxes, calculating the areas of the bounding boxes, and screening the bounding boxes according to preset conditions; the discrete points are subjected to convexhull to obtain a plurality of convex hulls, and the obtained convex hulls are packaged one by one to form a polygonal bounding box; screening the bounding box is achieved by setting the size and concentration parameters of the corrosion particles;
s130, magnifying the screened bounding boxes proportionally by an expansion algorithm to realize the mutual communication of the outlines among the bounding boxes; the profiles among the multiple bounding boxes are mutually communicated by setting the profile layer number parameters of the communication area;
s140, after the outlines of the bounding boxes are communicated, taking a union set of the remaining bounding boxes after the communication according to preset conditions;
s150, calculating the rectangular range of the bounding box after union according to the width regression of the character area fonts, and selecting the bounding box with the smallest rectangular range, wherein the area corresponding to the bounding box is the character area to be identified.
2. The method for recognizing text areas of an OSD as claimed in claim 1, wherein,
in step S110, discrete points of the outline of the text region are obtained by findcontius.
3. The method according to claim 1, wherein after the step S130, if the contour sharpness of the text region does not reach a preset threshold, the steps S120 and S130 are repeated until the contour sharpness of the text region reaches the preset threshold.
4. An electronic device, comprising: the memory stores the identification program of the text region of the OSD, and the identification program of the text region of the OSD realizes the following steps when being executed by the processor:
s110, preprocessing an OSD file to obtain a frame-by-frame image, performing binarization threshold filtering on the frame-by-frame image by using a canny algorithm, performing frame contour acquisition on the filtered image, and finding out discrete points of the contour of a text region;
s120, packaging the discrete points into a plurality of polygonal bounding boxes, calculating the areas of the bounding boxes, and screening the bounding boxes according to preset conditions; the discrete points are subjected to convexhull to obtain a plurality of convex hulls, and the obtained convex hulls are packaged one by one to form a polygonal bounding box; screening the bounding box is achieved by setting the size and concentration parameters of the corrosion particles;
s130, magnifying the screened bounding boxes proportionally by an expansion algorithm to realize the mutual communication of the outlines among the bounding boxes; the profiles among the multiple bounding boxes are mutually communicated by setting the profile layer number parameters of the communication area;
s140, after the outlines of the bounding boxes are communicated, taking a union set of the remaining bounding boxes after the communication according to preset conditions;
s150, calculating the rectangular range of the bounding box after union according to the width regression of the character area fonts, and selecting the bounding box with the smallest rectangular range, wherein the area corresponding to the bounding box is the character area to be identified.
5. The electronic device of claim 4, wherein the electronic device comprises a plurality of electronic devices,
in step S110, discrete points of the outline of the text region are obtained by findcontius.
6. The electronic device of claim 4, wherein the electronic device comprises a plurality of electronic devices,
after the step S130, if the contour sharpness of the text region does not reach the preset threshold, repeating the steps S120 and S130 until the contour sharpness of the text region reaches the preset threshold.
7. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising a program for identifying a text region of an OSD, which when executed by a processor, implements the steps of the method for identifying a text region of an OSD according to any one of claims 1 to 3.
CN201910885665.8A 2019-09-19 2019-09-19 Method, device and storage medium for identifying text region of OSD (on Screen display) Active CN110717489B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910885665.8A CN110717489B (en) 2019-09-19 2019-09-19 Method, device and storage medium for identifying text region of OSD (on Screen display)
PCT/CN2019/118284 WO2021051604A1 (en) 2019-09-19 2019-11-14 Method for identifying text region of osd, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910885665.8A CN110717489B (en) 2019-09-19 2019-09-19 Method, device and storage medium for identifying text region of OSD (on Screen display)

Publications (2)

Publication Number Publication Date
CN110717489A CN110717489A (en) 2020-01-21
CN110717489B true CN110717489B (en) 2023-09-15

Family

ID=69209932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910885665.8A Active CN110717489B (en) 2019-09-19 2019-09-19 Method, device and storage medium for identifying text region of OSD (on Screen display)

Country Status (2)

Country Link
CN (1) CN110717489B (en)
WO (1) WO2021051604A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111405345B (en) * 2020-03-19 2022-03-01 展讯通信(上海)有限公司 Image processing method, image processing device, display device and readable storage medium
CN111444903B (en) * 2020-03-23 2022-12-09 西安交通大学 Method, device and equipment for positioning characters in cartoon bubbles and readable storage medium
CN111783493A (en) * 2020-06-18 2020-10-16 福州富昌维控电子科技有限公司 Identification method and identification terminal for batch two-dimensional codes
CN111611783B (en) * 2020-06-18 2023-04-25 山东旗帜信息有限公司 Positioning and segmentation method and device for graphic form
CN112019925B (en) * 2020-10-29 2021-01-22 蘑菇车联信息科技有限公司 Video watermark identification processing method and device
CN112800824B (en) * 2020-12-08 2024-02-02 北京方正印捷数码技术有限公司 Method, device, equipment and storage medium for processing scanned file
CN113688815A (en) * 2021-06-01 2021-11-23 无锡启凌科技有限公司 Medicine packaging text computer recognition algorithm and device for complex illumination environment
CN113486892B (en) * 2021-07-02 2023-11-28 东北大学 Production information acquisition method and system based on smart phone image recognition
CN114125705B (en) * 2021-11-19 2024-03-08 中国电子科技集团公司第二十八研究所 ADS-B base station monitoring range estimation method based on mathematical morphology
US11741732B2 (en) 2021-12-22 2023-08-29 International Business Machines Corporation Techniques for detecting text
CN114266800B (en) * 2021-12-24 2023-05-05 中设数字技术股份有限公司 Method and system for generating multiple rectangular bounding boxes of plane graph
CN115620302B (en) * 2022-11-22 2023-12-01 山东捷瑞数字科技股份有限公司 Picture font identification method, system, electronic equipment and storage medium
CN116433701B (en) * 2023-06-15 2023-10-10 武汉中观自动化科技有限公司 Workpiece hole profile extraction method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563380A (en) * 2017-09-08 2018-01-09 上海理工大学 A kind of vehicle license plate detection recognition method being combined based on MSER and SWT
CN109002824A (en) * 2018-06-27 2018-12-14 淮阴工学院 A kind of architectural drawing label information detection method based on OpenCV
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2220590A1 (en) * 2007-11-28 2010-08-25 Lumex As A method for processing optical character recognition (ocr) data, wherein the output comprises visually impaired character images
CN108171104B (en) * 2016-12-08 2022-05-10 腾讯科技(深圳)有限公司 Character detection method and device
CN108805116B (en) * 2018-05-18 2022-06-24 浙江蓝鸽科技有限公司 Image text detection method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563380A (en) * 2017-09-08 2018-01-09 上海理工大学 A kind of vehicle license plate detection recognition method being combined based on MSER and SWT
CN109002824A (en) * 2018-06-27 2018-12-14 淮阴工学院 A kind of architectural drawing label information detection method based on OpenCV
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于聚类分析的手写维吾尔文档图像中单词切分技术研究";阿依萨代提•阿卜力孜;《中国优秀硕士学位论文全文数据库信息科技辑》;20180215(第02期);第I138-1706页 *

Also Published As

Publication number Publication date
CN110717489A (en) 2020-01-21
WO2021051604A1 (en) 2021-03-25

Similar Documents

Publication Publication Date Title
CN110717489B (en) Method, device and storage medium for identifying text region of OSD (on Screen display)
CN113781402B (en) Method and device for detecting scratch defects on chip surface and computer equipment
JP6719457B2 (en) Method and system for extracting main subject of image
Zhang et al. Image segmentation based on 2D Otsu method with histogram analysis
EP3332356B1 (en) Semi-automatic image segmentation
CN110232713B (en) Image target positioning correction method and related equipment
US20080166016A1 (en) Fast Method of Object Detection by Statistical Template Matching
US20170039723A1 (en) Image Object Segmentation Using Examples
CN109086724B (en) Accelerated human face detection method and storage medium
CN108830832A (en) A kind of plastic barrel surface defects detection algorithm based on machine vision
CN108229342B (en) Automatic sea surface ship target detection method
KR20130056309A (en) Text-based 3d augmented reality
CN108399424B (en) Point cloud classification method, intelligent terminal and storage medium
CN111259878A (en) Method and equipment for detecting text
CN107545223B (en) Image recognition method and electronic equipment
CN108665495B (en) Image processing method and device and mobile terminal
US8396297B2 (en) Supervised edge detection using fractal signatures
CN109255792B (en) Video image segmentation method and device, terminal equipment and storage medium
CN115760820A (en) Plastic part defect image identification method and application
CN115471476A (en) Method, device, equipment and medium for detecting component defects
CN115439523A (en) Method and equipment for detecting pin size of semiconductor device and storage medium
JP2017500662A (en) Method and system for correcting projection distortion
CN108205641B (en) Gesture image processing method and device
CN113343987B (en) Text detection processing method and device, electronic equipment and storage medium
CN115187744A (en) Cabinet identification method based on laser point cloud

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant