CN113887430B - Method and system for locating polling video text - Google Patents
Method and system for locating polling video text Download PDFInfo
- Publication number
- CN113887430B CN113887430B CN202111165413.1A CN202111165413A CN113887430B CN 113887430 B CN113887430 B CN 113887430B CN 202111165413 A CN202111165413 A CN 202111165413A CN 113887430 B CN113887430 B CN 113887430B
- Authority
- CN
- China
- Prior art keywords
- super
- image
- pixel
- gray level
- pixels
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000012216 screening Methods 0.000 claims abstract description 19
- 238000001514 detection method Methods 0.000 claims abstract description 16
- 239000011159 matrix material Substances 0.000 claims description 42
- 238000007781 pre-processing Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 10
- 230000011218 segmentation Effects 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 6
- 230000000877 morphologic effect Effects 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 230000010339 dilation Effects 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 239000013598 vector Substances 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
- G06T5/30—Erosion or dilatation, e.g. thinning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration using histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of computer vision, and provides a polling video text positioning method and a polling video text positioning system, wherein key frame images of polling video are firstly extracted, and the key frame images are preprocessed to obtain LAB images and gray images; dividing pixel points in the LAB image into a plurality of super pixels, and screening the super pixels forming the candidate text region by combining the gray level image; classifying the super pixels forming the candidate character area; and finally, carrying out corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in a video picture, thereby improving the text positioning speed and accuracy in the polling video.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a polling video text positioning method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The conference polling video is formed by sequentially uploading monitoring pictures at fixed time lengths of different conference sites participating in a conference, and comprises real-time conference information of all conference sites participating in the conference. According to meeting specifications of the sub-meeting places in the video conference, each sub-meeting place needs to be provided with a baffle plate or a background plate in the meeting room, and characters with a specified font format are printed on the baffle plate or the background plate according to meeting specification requirements to serve as important information for checking meeting conditions of each sub-meeting place. And for record checking of text areas in a video conference polling picture, if manual judgment is relied on, background check-in personnel are occupied for a long time, and the work efficiency is difficult to improve.
The computer is used for carrying out digital image processing on the key frames of the polling video, so that the text areas in the uploading monitoring picture can be positioned and intercepted, background check-in personnel can be assisted to check the meeting conditions of each meeting place, the work load of the background check-in personnel can be reduced, and the work efficiency can be improved.
In recent years, researchers have studied methods of locating polled video text using digital image processing algorithms, for example, application number CN201710027704.1, inventor nos. Liu Mingzhu, li Wenjing and Zheng Yunfei, and patent entitled "text information feature extraction and recognition method based on Gabor filter" belongs to this list. However, most of the existing methods adopt deep learning algorithms, and have the defects of large sample demand, high algorithm complexity, large operation amount and the like.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a method and a system for locating characters in a polling video, so as to improve the speed and the accuracy of locating the characters in the polling video.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
A first aspect of the present invention provides a polling video text positioning method, comprising:
Acquiring a polling video and extracting a polling video key frame image;
Preprocessing a key frame image to obtain an LAB image and a gray level image;
dividing pixel points in the LAB image into a plurality of super pixels;
Screening super pixels forming candidate text areas by combining the gray level images;
classifying the super pixels forming the candidate text region;
And (3) carrying out corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in the video picture.
Further, the specific steps of extracting the polling video key frame image are as follows:
And calculating the frame difference distance of each frame of image, comparing the frame difference distance with a set threshold value, and judging the image as a key frame image when the frame difference distance is larger than the threshold value, or judging the image as a non-key frame image.
Further, the preprocessing includes:
Performing morphological dilation operation on the key frame image to obtain a dilated image;
converting the expanded image from the RGB color space to the LAB color space to obtain an LAB image;
And carrying out gray scale processing on the key frame image to obtain a gray scale image.
Further, the specific step of dividing the pixel points in the LAB image into a plurality of super pixels is as follows:
dividing pixel points in the LAB image into a plurality of initial super pixels with the same size;
calculating a clustering center of each super pixel;
searching a cluster center closest to a cluster center in a neighborhood of a pixel point for the pixel point in the LAB image, taking a super pixel corresponding to the cluster center as a new super pixel to which the pixel point belongs, traversing all pixels in the LAB image, and updating the super pixel to which each pixel point belongs;
Judging whether iteration is finished or not, if not, returning to calculate a new cluster center of the super pixel; otherwise, the super-pixels with the number smaller than the threshold value are merged into other super-pixels which are overlapped with the boundary most, and the super-pixels of each pixel in the LAB image are updated.
Further, the specific steps of screening the super pixels forming the candidate text region are as follows:
calculating a corresponding gray level histogram of each super pixel in the gray level image, and taking the frequency average value of a plurality of adjacent gray levels in the histogram as the frequency of the middle gray level to obtain a smoothed gray level histogram;
And taking the super-pixel with n peaks in the smoothed gray level histogram as the super-pixel constituting the candidate character area, and taking the gray level value corresponding to the peak as the characteristic of the super-pixel constituting the candidate character area.
Further, the specific steps of classification are as follows:
calculating the similarity between the super pixels forming the candidate text region;
setting iteration times, and iteratively calculating an attraction degree matrix and a attribution degree matrix based on the similarity;
based on the attraction degree matrix and the attribution degree matrix, a decision matrix is calculated, and the super pixels are divided into a plurality of classes.
Further, the specific steps of the corner detection are as follows:
Calculating the gradient value of each pixel point in the gray level image in the horizontal and vertical directions, and weighting by using a Gaussian function to obtain an autocorrelation matrix;
calculating the corner response value of each pixel point based on the autocorrelation matrix, comparing the corner response value with a corner discrimination threshold value, and selecting candidate corner points;
And carrying out local non-maximum suppression on the candidate corner points to obtain corner points in the gray level image.
A second aspect of the present invention provides a polling video text positioning system comprising:
a video key frame extraction module configured to: acquiring a polling video and extracting a polling video key frame image;
An image preprocessing module configured to: preprocessing a key frame image to obtain an LAB image and a gray level image;
a superpixel segmentation module configured to: dividing pixel points in the LAB image into a plurality of super pixels;
a superpixel screening module configured to: screening super pixels forming candidate text areas by combining the gray level images;
A superpixel classification module configured to: classifying the super pixels forming the candidate text region;
a text region discrimination module configured to: and (3) carrying out corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in the video picture.
A third aspect of the invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a method of locating a polling videotext as described above.
A fourth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in a method of locating polled videotext as described above when the program is executed.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a polling video text positioning method, which can rapidly and accurately position text areas of polling video pictures by using a clustering and corner detection algorithm, reduces the complexity of text area positioning and improves the applicability of the text positioning method.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a flowchart of a method for locating a polling video text according to an embodiment of the present invention;
FIG. 2 is a block diagram of a polling video word positioning system according to an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
As shown in fig. 1, the present embodiment provides a method for locating a text area in a polling video, which uses the text area in a polling video as a research object to study the locating method of the text area. Firstly, extracting a polling video key frame image, and preprocessing the key frame image to obtain an LAB image and a gray level image; dividing pixel points in the LAB image into a plurality of super pixels, and screening the super pixels forming the candidate text region by combining the gray level image; classifying the super pixels forming the candidate character area; and finally, carrying out corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in a video picture, thereby improving the text positioning speed and accuracy in the polling video.
Step 1: and acquiring the polling video and extracting key frame images of the polling video.
Taking a conference polling video as an example, the conference polling video is a conference video formed by sequentially uploading monitoring pictures in a fixed time length in the process of distinguishing conference sites to participate in a video conference. The conference polling video key frame is a video frame before the transition of the adjacent conference sites in the conference polling video, namely the last frame of the picture sent on each conference site, and can represent the conference contents of the conference sites.
And extracting the polling video key frames by using a threshold detection algorithm, calculating the frame difference distance of each frame of image, comparing the frame difference distance with a set threshold, judging the polling video key frames as key frames when the frame difference distance is larger than the threshold, and judging the polling video key frames as non-key frames otherwise. Specific:
(101) Calculating a frame difference distance of each frame of image:
Wherein K is the total frame number of the video, delta k is the frame difference distance of the kth frame image, D 1 is a rectangular discrimination area in the video picture, the area has high distinction degree between different meeting places and has high similarity in the same meeting place; f k(x,y,i)、Fk+1 (x, y, i) is the x row and y column element values of the i channel pixel value matrix corresponding to the kth frame and the kth+1st frame respectively;
(102) Setting a proper threshold T 1 according to the calculated data sample, and judging the kth frame as a key frame when delta k>T1 is carried out; when delta k≤T1, the kth frame is determined to be a non-key frame.
Step 2: and preprocessing the extracted key frame image to obtain an LAB image and a gray level image.
(201) Performing morphological expansion operation on the acquired key frame image F by using a square structural element matrix C to obtain an expanded image F dilate, wherein the calculation formula is as follows:
Fdilate(x,y,i)=max{F(x-xc,y-yc,i)|(x-xc,y-yc)∈Df;(xc,yc)∈Dc}i∈[R,G,B]
Wherein, C is a4×4 all-zero matrix, and D f、Dc is the pixel coordinate domain of F and C respectively;
(202) Converting the image F dilate in the RGB color space to the LAB color space to obtain a LAB image F dilate;
(203) And carrying out gray processing on the key frame image F to obtain a corresponding gray image F gray.
Step 3: super-pixel segmentation is carried out on the LAB image F dilate based on a SLIC (SIMPLE LINEAR ITERATIVE Clustering ) algorithm, and pixel points in the LAB image are divided into a plurality of super-pixels: dividing pixel points in the LAB image into a plurality of initial super pixels with the same size; calculating a clustering center of each super pixel; searching a cluster center closest to a cluster center in a neighborhood of a pixel point for the pixel point in the LAB image, taking a super pixel corresponding to the cluster center as a new super pixel to which the pixel point belongs, traversing all pixels in the LAB image, and updating the super pixel to which each pixel point belongs; judging whether iteration is finished or not, if not, returning to calculate a new cluster center of the super pixel; otherwise, the super-pixels with the number smaller than the threshold value are merged into other super-pixels which are overlapped with the boundary most, and the super-pixels of each pixel in the LAB image are updated.
And performing super-pixel segmentation on the LAB image F dilate by using an SLIC algorithm, wherein each super-pixel consists of a series of pixel points with similar color, brightness and other characteristics, and the algorithm flow is as follows:
(301) For a pixel in the LAB image F dilate, constructing a 5-dimensional feature vector [ l, a, b, x, y ] using the l, a, b components of the pixel in the LAB color space and the x, y coordinates in the pixel coordinate system for describing the features of each pixel;
(302) Setting the number of superpixels h=300, defining a grid interval Where N is the total number of pixels of image F dilate; dividing pixel points in the LAB image into a plurality of initial superpixels, and initializing a cluster center cluster i=[li,ai,bi,xi,yi (i=1, 2, …, 300) corresponding to each initial superpixel to uniformly distribute the pixels according to grid intervals;
(303) Calculating the distance between each pixel in the LAB image and the cluster center, and defining the distance measure between the pixel N (n=1, 2, …, N) and the cluster center cluster i as
Wherein d ni is the integrated distance; d c、ds is the color distance and the spatial distance, respectively; p c is a color coefficient for adjusting the compactness of the super pixel, and is set to 30;
(304) T iterative operations are performed, preferably, t=10, and the super pixels and their cluster centers are updated in the following iterative manner:
Regarding a cluster center contained in a 2s×2s neighborhood of a pixel n, taking a super-pixel label i corresponding to a cluster center cluster i with the smallest comprehensive distance to the pixel as a new label of the pixel, and marking as label n =i;
Traversing all pixels in the image, calculating a new label of each pixel, and re-performing super-pixel segmentation according to the new label, wherein the average value of 5-dimensional feature vectors of all pixel points in a single super-pixel is used as a clustering center of the super-pixel;
(305) The super-pixels with the number of the contained pixels being smaller than the threshold value are incorporated into other super-pixels which are overlapped with the boundary of the super-pixels most, the label of each pixel in the image is updated and is marked as label n=j(j=1,2,…,h1), wherein h 1 is the number of the updated super-pixels, and h 1 is less than or equal to h. Preferably, the threshold is 50.
Step 4: super-pixel screening based on gray level histogram: screening super pixels included in the candidate text region by combining a gray histogram of the gray image, and specifically:
(401) For h 1 super pixels obtained through SLIC algorithm processing, calculating a gray level histogram corresponding to a single super pixel in a gray level image F gray, taking the frequency average value of adjacent gray levels in the histogram as the frequency of the middle gray level, preferably taking the frequency average value of adjacent 3 gray levels in the histogram as the frequency of the middle gray level, and obtaining a smoothed gray level histogram;
(402) Traversing all the super pixels, incorporating M super pixels which have n peaks in the smoothed gray level histogram into the candidate text region, preferably, n is 2, and constructing a two-dimensional vector [ peak 1,peak2 ] by using gray values corresponding to the peaks, wherein the super pixels conforming to the conditions are described, namely, the gray values corresponding to the peaks are used as the characteristics of the super pixels forming the candidate text region.
Step 5: based on an AP (Affinity Propagation, neighbor propagation) algorithm, classifying the super pixels constituting the candidate text region: calculating the similarity between the super pixels constituting the candidate text region based on the characteristics of the super pixels of the candidate text region; setting iteration times, and iteratively calculating an attraction degree matrix and a attribution degree matrix based on the similarity; calculating a decision matrix based on the attraction degree matrix and the attribution degree matrix, and dividing the super pixels into a plurality of classes; specific:
The AP algorithm is used for adaptively classifying M super pixels forming the candidate text region, each class of super pixels is formed by similar super pixels to form a cluster, the candidate text region can be further reduced to one class, and the algorithm flow is as follows:
(501) The similarity between the super pixels is calculated by using the two-dimensional vector [ peak 1,peak2 ], and a similarity matrix S is constructed, and the calculation formula is as follows:
Wherein S (i, k) is the similarity between the ith superpixel and the kth superpixel, and [ peak 1i,peak2i]、[peak1k,peak2k ] is the 2-dimensional vector corresponding to the ith superpixel and the kth superpixel respectively; the main diagonal element S (i, i) of the matrix S represents the possibility that the ith super pixel is taken as a clustering center and is set as the element mean value of the matrix;
(502) Calculating an attraction degree matrix R and a attribution degree matrix A:
The attraction degree R (i, k) represents the supporting degree of the ith super pixel to the kth super pixel as a clustering center, and the calculation formula is as follows:
Wherein, during the first operation, the attribution degree matrix A is set as a zero matrix;
The attribution degree a (i, k) represents the suitability of the kth superpixel as the cluster center of the ith superpixel, and when i+.k, the calculation formula of the attribution degree is as follows:
When i=k, the calculation formula of the attribution degree is as follows:
(503) Updating the attraction degree matrix R and the attribution degree matrix A in the following way:
Rt+1:=(1-λ)·Rt+1+λ·Rt
At+1:=(1-λ)·At+1+λ·At
wherein, R t+1、At+1 is the attraction matrix and the attribution matrix of the (t+1) th iteration respectively, and R t、At is the attraction matrix and the attribution matrix of the (t) th iteration respectively; λ is a damping coefficient, and is used for balancing the results of two adjacent iterations, and is set to 0.5; : the symbol left R t+1 is updated and the symbol right R t+1 is pre-updated.
(504) Repeating the operations of the steps (502) and (503), setting the iteration times according to the data volume, and preferably, performing 30 iterations;
(505) Let decision matrix e=r 30+A30, if the maximum value of the i-th row of decision matrix E is E (i, k), it means that the i-th super-pixel belongs to the cluster centered on the k-th super-pixel; m superpixels are divided into q clusters by traversing each row of elements of matrix E, where q.ltoreq.M.
Step 6: character area discrimination based on corner detection: and (3) carrying out corner detection, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text region in the video picture.
The method comprises the steps of using a Harris operator to detect corner points of q clusters in an image F gray, selecting the cluster with the largest number of the corner points as a text area in a video picture, and detecting the corner points as follows:
(601) The gradient values F x and F y of each pixel in the gray image F gray in the horizontal and vertical directions are calculated, weighted calculation is performed on the gradient values by using a gaussian function ω (x, y), and an autocorrelation matrix M is constructed, and the calculation formula is as follows:
(602) Based on the autocorrelation matrix, the corner response value R of each pixel in the gray image F gray is calculated, and the calculation formula is as follows:
R=detM-β(traceM)2
wherein detM is the determinant of matrix M, traceM is the trace of matrix M; beta is an empirical parameter, set to 0.05;
Comparing the corner response value of each pixel with a corner discrimination threshold value, and selecting candidate corners, specifically: setting a corner judgment threshold T 2, and judging that the target pixel is a candidate corner if R > T 2;
(603) And carrying out local non-maximum suppression on the candidate corner points to obtain corner points in the image F gray.
And respectively counting the number of corner points contained in each q clusters, and selecting the cluster with the largest number of the corner points as a text region in the video picture.
The invention can rapidly and accurately position the text region of the polling video picture by reasonably applying the clustering and corner detection algorithm, can reduce the complexity of calculation and improves the applicability.
Example two
As shown in fig. 2, the present embodiment provides a polling video text positioning system, which specifically includes the following modules:
The device comprises a video key frame extraction module, an image preprocessing module, a super-pixel segmentation module, a super-pixel screening module, a super-pixel classification module and a text region discrimination module; the video key frame extraction module is connected with the image preprocessing module, the output end of the image preprocessing module is connected to the super-pixel segmentation module, the output end of the super-pixel segmentation module is connected to the super-pixel screening module, the output end of the super-pixel screening module is connected to the super-pixel classification module, and the output end of the super-pixel classification module is connected to the text region discrimination module.
The method comprises the steps of obtaining a key frame image through a video key frame extraction module, preprocessing the key frame image through an image preprocessing module, segmenting the image through a super-pixel segmentation module based on an SLIC algorithm, obtaining a candidate text region through a super-pixel screening module based on a gray histogram, dividing the candidate text region through a super-pixel classification module based on an AP algorithm, and finally positioning the text region through a text region judging module based on corner detection, which is significant for improving text positioning speed and accuracy in a polling video.
A video key frame extraction module configured to: acquiring a polling video and extracting a polling video key frame image;
An image preprocessing module configured to: preprocessing a key frame image to obtain an LAB image and a gray level image;
a superpixel segmentation module configured to: dividing pixel points in the LAB image into a plurality of super pixels;
a superpixel screening module configured to: screening super pixels forming candidate text areas by combining the gray level images;
A superpixel classification module configured to: classifying the super pixels forming the candidate text region;
a text region discrimination module configured to: and (3) carrying out corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in the video picture.
The invention can rapidly and accurately locate the text region of the polling video picture by reasonably applying the clustering and corner detection algorithm. The complexity of the system can be reduced, and the applicability of the system is improved.
It should be noted that, each module in the embodiment corresponds to each step in the first embodiment one to one, and the implementation process is the same, which is not described here.
Example III
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a polling videotext positioning method as described in the above embodiment one.
Example IV
The present embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps in a polling video text positioning method according to the above embodiment.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disc, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A method for locating a polling video text, comprising:
Acquiring a polling video and extracting a polling video key frame image;
Preprocessing a key frame image to obtain an LAB image and a gray level image;
dividing pixel points in the LAB image into a plurality of super pixels;
Screening super pixels forming candidate text areas by combining the gray level images;
classifying the super pixels forming the candidate text region;
Performing corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in a video picture;
The specific steps of dividing the pixel points in the LAB image into a plurality of super pixels are as follows: dividing pixel points in the LAB image into a plurality of initial super pixels with the same size; calculating a clustering center of each super pixel; searching a cluster center closest to a cluster center in a neighborhood of a pixel point for the pixel point in the LAB image, taking a super pixel corresponding to the cluster center as a new super pixel to which the pixel point belongs, traversing all pixels in the LAB image, and updating the super pixel to which each pixel point belongs; judging whether iteration is finished or not, if not, returning to calculate a new cluster center of the super pixel; otherwise, the super-pixels with the number smaller than the threshold value are merged into other super-pixels which are overlapped with the boundary most, and the super-pixels of each pixel in the LAB image are updated;
The specific steps of screening the super pixels forming the candidate text area are as follows: calculating a corresponding gray level histogram of each super pixel in the gray level image, and taking the frequency average value of a plurality of adjacent gray levels in the histogram as the frequency of the middle gray level to obtain a smoothed gray level histogram; and taking the super-pixel with n peaks in the smoothed gray level histogram as the super-pixel constituting the candidate character area, and taking the gray level value corresponding to the peak as the characteristic of the super-pixel constituting the candidate character area.
2. The polling video text positioning method according to claim 1, wherein the specific steps of extracting the polling video key frame image are as follows:
And calculating the frame difference distance of each frame of image, comparing the frame difference distance with a set threshold value, and judging the image as a key frame image when the frame difference distance is larger than the threshold value, or judging the image as a non-key frame image.
3. The polling video text positioning method of claim 1, wherein the preprocessing comprises:
Performing morphological dilation operation on the key frame image to obtain a dilated image;
converting the expanded image from the RGB color space to the LAB color space to obtain an LAB image;
And carrying out gray scale processing on the key frame image to obtain a gray scale image.
4. The method for locating polling video text according to claim 1, wherein the specific steps of classification are:
calculating the similarity between the super pixels forming the candidate text region;
setting iteration times, and iteratively calculating an attraction degree matrix and a attribution degree matrix based on the similarity;
based on the attraction degree matrix and the attribution degree matrix, a decision matrix is calculated, and the super pixels are divided into a plurality of classes.
5. The polling video text positioning method as claimed in claim 1, wherein the specific steps of corner detection are as follows:
Calculating the gradient value of each pixel point in the gray level image in the horizontal and vertical directions, and weighting by using a Gaussian function to obtain an autocorrelation matrix;
calculating the corner response value of each pixel point based on the autocorrelation matrix, comparing the corner response value with a corner discrimination threshold value, and selecting candidate corner points;
And carrying out local non-maximum suppression on the candidate corner points to obtain corner points in the gray level image.
6. A polling video text positioning system, comprising:
a video key frame extraction module configured to: acquiring a polling video and extracting a polling video key frame image;
An image preprocessing module configured to: preprocessing a key frame image to obtain an LAB image and a gray level image;
a superpixel segmentation module configured to: dividing pixel points in the LAB image into a plurality of super pixels;
a superpixel screening module configured to: screening super pixels forming candidate text areas by combining the gray level images;
A superpixel classification module configured to: classifying the super pixels forming the candidate text region;
A text region discrimination module configured to: performing corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in a video picture;
The specific steps of dividing the pixel points in the LAB image into a plurality of super pixels are as follows: dividing pixel points in the LAB image into a plurality of initial super pixels with the same size; calculating a clustering center of each super pixel; searching a cluster center closest to a cluster center in a neighborhood of a pixel point for the pixel point in the LAB image, taking a super pixel corresponding to the cluster center as a new super pixel to which the pixel point belongs, traversing all pixels in the LAB image, and updating the super pixel to which each pixel point belongs; judging whether iteration is finished or not, if not, returning to calculate a new cluster center of the super pixel; otherwise, the super-pixels with the number smaller than the threshold value are merged into other super-pixels which are overlapped with the boundary most, and the super-pixels of each pixel in the LAB image are updated;
The specific steps of screening the super pixels forming the candidate text area are as follows: calculating a corresponding gray level histogram of each super pixel in the gray level image, and taking the frequency average value of a plurality of adjacent gray levels in the histogram as the frequency of the middle gray level to obtain a smoothed gray level histogram; and taking the super-pixel with n peaks in the smoothed gray level histogram as the super-pixel constituting the candidate character area, and taking the gray level value corresponding to the peak as the characteristic of the super-pixel constituting the candidate character area.
7. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of a method of polling video text locations as claimed in any one of claims 1-5.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of a method of locating a polled videotext as claimed in any of claims 1 to 5 when the program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111165413.1A CN113887430B (en) | 2021-09-30 | 2021-09-30 | Method and system for locating polling video text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111165413.1A CN113887430B (en) | 2021-09-30 | 2021-09-30 | Method and system for locating polling video text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113887430A CN113887430A (en) | 2022-01-04 |
CN113887430B true CN113887430B (en) | 2024-04-30 |
Family
ID=79005119
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111165413.1A Active CN113887430B (en) | 2021-09-30 | 2021-09-30 | Method and system for locating polling video text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113887430B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699895A (en) * | 2013-12-12 | 2014-04-02 | 天津大学 | Method for detecting and extracting text in video |
CN105528794A (en) * | 2016-01-15 | 2016-04-27 | 上海应用技术学院 | Moving object detection method based on Gaussian mixture model and superpixel segmentation |
CN106204570A (en) * | 2016-07-05 | 2016-12-07 | 安徽工业大学 | A kind of angular-point detection method based on non-causal fractional order gradient operator |
CN112150412A (en) * | 2020-08-31 | 2020-12-29 | 衢州光明电力投资集团有限公司赋腾科技分公司 | Insulator self-explosion defect detection method based on projection curve analysis |
CN112990191A (en) * | 2021-01-06 | 2021-06-18 | 中国电子科技集团公司信息科学研究院 | Shot boundary detection and key frame extraction method based on subtitle video |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4861845B2 (en) * | 2007-02-05 | 2012-01-25 | 富士通株式会社 | Telop character extraction program, recording medium, method and apparatus |
-
2021
- 2021-09-30 CN CN202111165413.1A patent/CN113887430B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699895A (en) * | 2013-12-12 | 2014-04-02 | 天津大学 | Method for detecting and extracting text in video |
CN105528794A (en) * | 2016-01-15 | 2016-04-27 | 上海应用技术学院 | Moving object detection method based on Gaussian mixture model and superpixel segmentation |
CN106204570A (en) * | 2016-07-05 | 2016-12-07 | 安徽工业大学 | A kind of angular-point detection method based on non-causal fractional order gradient operator |
CN112150412A (en) * | 2020-08-31 | 2020-12-29 | 衢州光明电力投资集团有限公司赋腾科技分公司 | Insulator self-explosion defect detection method based on projection curve analysis |
CN112990191A (en) * | 2021-01-06 | 2021-06-18 | 中国电子科技集团公司信息科学研究院 | Shot boundary detection and key frame extraction method based on subtitle video |
Also Published As
Publication number | Publication date |
---|---|
CN113887430A (en) | 2022-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11403839B2 (en) | Commodity detection terminal, commodity detection method, system, computer device, and computer readable medium | |
CN107945185B (en) | Image segmentation method and system based on wide residual pyramid pooling network | |
CN110866896B (en) | Image saliency target detection method based on k-means and level set super-pixel segmentation | |
CN108986152B (en) | Foreign matter detection method and device based on difference image | |
CN109685045B (en) | Moving target video tracking method and system | |
CN110766017B (en) | Mobile terminal text recognition method and system based on deep learning | |
CN112990077B (en) | Face action unit identification method and device based on joint learning and optical flow estimation | |
CN107688829A (en) | A kind of identifying system and recognition methods based on SVMs | |
CN109299305A (en) | A kind of spatial image searching system based on multi-feature fusion and search method | |
CN114049499A (en) | Target object detection method, apparatus and storage medium for continuous contour | |
CN104537381B (en) | A kind of fuzzy image recognition method based on fuzzy invariant features | |
CN105809716A (en) | Superpixel and three-dimensional self-organizing background subtraction algorithm-combined foreground extraction method | |
CN111414938B (en) | Target detection method for bubbles in plate heat exchanger | |
CN111881732B (en) | SVM (support vector machine) -based face quality evaluation method | |
CN115131714A (en) | Intelligent detection and analysis method and system for video image | |
CN114022531A (en) | Image processing method, electronic device, and storage medium | |
CN113743251B (en) | Target searching method and device based on weak supervision scene | |
CN107967481A (en) | A kind of image classification method based on locality constraint and conspicuousness | |
CN108647605B (en) | Human eye gaze point extraction method combining global color and local structural features | |
Jin et al. | Fusing Canny operator with vibe algorithm for target detection | |
CN106056575B (en) | A kind of image matching method based on like physical property proposed algorithm | |
CN110276260B (en) | Commodity detection method based on depth camera | |
CN113887430B (en) | Method and system for locating polling video text | |
CN105844299B (en) | A kind of image classification method based on bag of words | |
CN106650629A (en) | Kernel sparse representation-based fast remote sensing target detection and recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |