CN113887430B

CN113887430B - Method and system for locating polling video text

Info

Publication number: CN113887430B
Application number: CN202111165413.1A
Authority: CN
Inventors: 何子亨; 王朔; 左修洋; 王雨晨; 孙丽丽; 张延童; 刘鸿雁; 车四四; 刘方舟; 朱立楠; 张中行
Original assignee: State Grid Corp of China SGCC; Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2024-04-30
Anticipated expiration: 2041-09-30
Also published as: CN113887430A

Abstract

The invention belongs to the technical field of computer vision, and provides a polling video text positioning method and a polling video text positioning system, wherein key frame images of polling video are firstly extracted, and the key frame images are preprocessed to obtain LAB images and gray images; dividing pixel points in the LAB image into a plurality of super pixels, and screening the super pixels forming the candidate text region by combining the gray level image; classifying the super pixels forming the candidate character area; and finally, carrying out corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in a video picture, thereby improving the text positioning speed and accuracy in the polling video.

Description

Method and system for locating polling video text

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a polling video text positioning method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The conference polling video is formed by sequentially uploading monitoring pictures at fixed time lengths of different conference sites participating in a conference, and comprises real-time conference information of all conference sites participating in the conference. According to meeting specifications of the sub-meeting places in the video conference, each sub-meeting place needs to be provided with a baffle plate or a background plate in the meeting room, and characters with a specified font format are printed on the baffle plate or the background plate according to meeting specification requirements to serve as important information for checking meeting conditions of each sub-meeting place. And for record checking of text areas in a video conference polling picture, if manual judgment is relied on, background check-in personnel are occupied for a long time, and the work efficiency is difficult to improve.

The computer is used for carrying out digital image processing on the key frames of the polling video, so that the text areas in the uploading monitoring picture can be positioned and intercepted, background check-in personnel can be assisted to check the meeting conditions of each meeting place, the work load of the background check-in personnel can be reduced, and the work efficiency can be improved.

In recent years, researchers have studied methods of locating polled video text using digital image processing algorithms, for example, application number CN201710027704.1, inventor nos. Liu Mingzhu, li Wenjing and Zheng Yunfei, and patent entitled "text information feature extraction and recognition method based on Gabor filter" belongs to this list. However, most of the existing methods adopt deep learning algorithms, and have the defects of large sample demand, high algorithm complexity, large operation amount and the like.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides a method and a system for locating characters in a polling video, so as to improve the speed and the accuracy of locating the characters in the polling video.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

A first aspect of the present invention provides a polling video text positioning method, comprising:

Acquiring a polling video and extracting a polling video key frame image;

Preprocessing a key frame image to obtain an LAB image and a gray level image;

dividing pixel points in the LAB image into a plurality of super pixels;

Screening super pixels forming candidate text areas by combining the gray level images;

classifying the super pixels forming the candidate text region;

And (3) carrying out corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in the video picture.

Further, the specific steps of extracting the polling video key frame image are as follows:

And calculating the frame difference distance of each frame of image, comparing the frame difference distance with a set threshold value, and judging the image as a key frame image when the frame difference distance is larger than the threshold value, or judging the image as a non-key frame image.

Further, the preprocessing includes:

Performing morphological dilation operation on the key frame image to obtain a dilated image;

converting the expanded image from the RGB color space to the LAB color space to obtain an LAB image;

And carrying out gray scale processing on the key frame image to obtain a gray scale image.

Further, the specific step of dividing the pixel points in the LAB image into a plurality of super pixels is as follows:

dividing pixel points in the LAB image into a plurality of initial super pixels with the same size;

calculating a clustering center of each super pixel;

searching a cluster center closest to a cluster center in a neighborhood of a pixel point for the pixel point in the LAB image, taking a super pixel corresponding to the cluster center as a new super pixel to which the pixel point belongs, traversing all pixels in the LAB image, and updating the super pixel to which each pixel point belongs;

Judging whether iteration is finished or not, if not, returning to calculate a new cluster center of the super pixel; otherwise, the super-pixels with the number smaller than the threshold value are merged into other super-pixels which are overlapped with the boundary most, and the super-pixels of each pixel in the LAB image are updated.

Further, the specific steps of screening the super pixels forming the candidate text region are as follows:

calculating a corresponding gray level histogram of each super pixel in the gray level image, and taking the frequency average value of a plurality of adjacent gray levels in the histogram as the frequency of the middle gray level to obtain a smoothed gray level histogram;

And taking the super-pixel with n peaks in the smoothed gray level histogram as the super-pixel constituting the candidate character area, and taking the gray level value corresponding to the peak as the characteristic of the super-pixel constituting the candidate character area.

Further, the specific steps of classification are as follows:

calculating the similarity between the super pixels forming the candidate text region;

setting iteration times, and iteratively calculating an attraction degree matrix and a attribution degree matrix based on the similarity;

based on the attraction degree matrix and the attribution degree matrix, a decision matrix is calculated, and the super pixels are divided into a plurality of classes.

Further, the specific steps of the corner detection are as follows:

Calculating the gradient value of each pixel point in the gray level image in the horizontal and vertical directions, and weighting by using a Gaussian function to obtain an autocorrelation matrix;

calculating the corner response value of each pixel point based on the autocorrelation matrix, comparing the corner response value with a corner discrimination threshold value, and selecting candidate corner points;

And carrying out local non-maximum suppression on the candidate corner points to obtain corner points in the gray level image.

A second aspect of the present invention provides a polling video text positioning system comprising:

a video key frame extraction module configured to: acquiring a polling video and extracting a polling video key frame image;

An image preprocessing module configured to: preprocessing a key frame image to obtain an LAB image and a gray level image;

a superpixel segmentation module configured to: dividing pixel points in the LAB image into a plurality of super pixels;

a superpixel screening module configured to: screening super pixels forming candidate text areas by combining the gray level images;

A superpixel classification module configured to: classifying the super pixels forming the candidate text region;

a text region discrimination module configured to: and (3) carrying out corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in the video picture.

A third aspect of the invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a method of locating a polling videotext as described above.

A fourth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in a method of locating polled videotext as described above when the program is executed.

Compared with the prior art, the invention has the beneficial effects that:

The invention provides a polling video text positioning method, which can rapidly and accurately position text areas of polling video pictures by using a clustering and corner detection algorithm, reduces the complexity of text area positioning and improves the applicability of the text positioning method.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a flowchart of a method for locating a polling video text according to an embodiment of the present invention;

FIG. 2 is a block diagram of a polling video word positioning system according to an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

As shown in fig. 1, the present embodiment provides a method for locating a text area in a polling video, which uses the text area in a polling video as a research object to study the locating method of the text area. Firstly, extracting a polling video key frame image, and preprocessing the key frame image to obtain an LAB image and a gray level image; dividing pixel points in the LAB image into a plurality of super pixels, and screening the super pixels forming the candidate text region by combining the gray level image; classifying the super pixels forming the candidate character area; and finally, carrying out corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in a video picture, thereby improving the text positioning speed and accuracy in the polling video.

Step 1: and acquiring the polling video and extracting key frame images of the polling video.

Taking a conference polling video as an example, the conference polling video is a conference video formed by sequentially uploading monitoring pictures in a fixed time length in the process of distinguishing conference sites to participate in a video conference. The conference polling video key frame is a video frame before the transition of the adjacent conference sites in the conference polling video, namely the last frame of the picture sent on each conference site, and can represent the conference contents of the conference sites.

And extracting the polling video key frames by using a threshold detection algorithm, calculating the frame difference distance of each frame of image, comparing the frame difference distance with a set threshold, judging the polling video key frames as key frames when the frame difference distance is larger than the threshold, and judging the polling video key frames as non-key frames otherwise. Specific:

(101) Calculating a frame difference distance of each frame of image:

Wherein K is the total frame number of the video, delta _k is the frame difference distance of the kth frame image, D ₁ is a rectangular discrimination area in the video picture, the area has high distinction degree between different meeting places and has high similarity in the same meeting place; f _k(x,y,i)、F_k+1 (x, y, i) is the x row and y column element values of the i channel pixel value matrix corresponding to the kth frame and the kth+1st frame respectively;

(102) Setting a proper threshold T ₁ according to the calculated data sample, and judging the kth frame as a key frame when delta _k>T₁ is carried out; when delta _k≤T₁, the kth frame is determined to be a non-key frame.

Step 2: and preprocessing the extracted key frame image to obtain an LAB image and a gray level image.

(201) Performing morphological expansion operation on the acquired key frame image F by using a square structural element matrix C to obtain an expanded image F _dilate, wherein the calculation formula is as follows:

F_dilate(x,y,i)＝max{F(x-x_c,y-y_c,i)|(x-x_c,y-y_c)∈D_f;(x_c,y_c)∈D_c}i∈[R,G,B]

Wherein, C is a4×4 all-zero matrix, and D _f、D_c is the pixel coordinate domain of F and C respectively;

(202) Converting the image F _dilate in the RGB color space to the LAB color space to obtain a LAB image F _dilate;

(203) And carrying out gray processing on the key frame image F to obtain a corresponding gray image F _gray.

Step 3: super-pixel segmentation is carried out on the LAB image F _dilate based on a SLIC (SIMPLE LINEAR ITERATIVE Clustering ) algorithm, and pixel points in the LAB image are divided into a plurality of super-pixels: dividing pixel points in the LAB image into a plurality of initial super pixels with the same size; calculating a clustering center of each super pixel; searching a cluster center closest to a cluster center in a neighborhood of a pixel point for the pixel point in the LAB image, taking a super pixel corresponding to the cluster center as a new super pixel to which the pixel point belongs, traversing all pixels in the LAB image, and updating the super pixel to which each pixel point belongs; judging whether iteration is finished or not, if not, returning to calculate a new cluster center of the super pixel; otherwise, the super-pixels with the number smaller than the threshold value are merged into other super-pixels which are overlapped with the boundary most, and the super-pixels of each pixel in the LAB image are updated.

And performing super-pixel segmentation on the LAB image F _dilate by using an SLIC algorithm, wherein each super-pixel consists of a series of pixel points with similar color, brightness and other characteristics, and the algorithm flow is as follows:

(301) For a pixel in the LAB image F _dilate, constructing a 5-dimensional feature vector [ l, a, b, x, y ] using the l, a, b components of the pixel in the LAB color space and the x, y coordinates in the pixel coordinate system for describing the features of each pixel;

(302) Setting the number of superpixels h=300, defining a grid interval Where N is the total number of pixels of image F _dilate; dividing pixel points in the LAB image into a plurality of initial superpixels, and initializing a cluster center cluster _i＝[l_i,a_i,b_i,x_i,y_i (i=1, 2, …, 300) corresponding to each initial superpixel to uniformly distribute the pixels according to grid intervals;

(303) Calculating the distance between each pixel in the LAB image and the cluster center, and defining the distance measure between the pixel N (n=1, 2, …, N) and the cluster center cluster _i as

Wherein d _ni is the integrated distance; d _c、d_s is the color distance and the spatial distance, respectively; p _c is a color coefficient for adjusting the compactness of the super pixel, and is set to 30;

(304) T iterative operations are performed, preferably, t=10, and the super pixels and their cluster centers are updated in the following iterative manner:

Regarding a cluster center contained in a 2s×2s neighborhood of a pixel n, taking a super-pixel label i corresponding to a cluster center cluster _i with the smallest comprehensive distance to the pixel as a new label of the pixel, and marking as label _n =i;

Traversing all pixels in the image, calculating a new label of each pixel, and re-performing super-pixel segmentation according to the new label, wherein the average value of 5-dimensional feature vectors of all pixel points in a single super-pixel is used as a clustering center of the super-pixel;

(305) The super-pixels with the number of the contained pixels being smaller than the threshold value are incorporated into other super-pixels which are overlapped with the boundary of the super-pixels most, the label of each pixel in the image is updated and is marked as label _n＝j(j＝1,2,…,h₁), wherein h ₁ is the number of the updated super-pixels, and h ₁ is less than or equal to h. Preferably, the threshold is 50.

Step 4: super-pixel screening based on gray level histogram: screening super pixels included in the candidate text region by combining a gray histogram of the gray image, and specifically:

(401) For h ₁ super pixels obtained through SLIC algorithm processing, calculating a gray level histogram corresponding to a single super pixel in a gray level image F _gray, taking the frequency average value of adjacent gray levels in the histogram as the frequency of the middle gray level, preferably taking the frequency average value of adjacent 3 gray levels in the histogram as the frequency of the middle gray level, and obtaining a smoothed gray level histogram;

(402) Traversing all the super pixels, incorporating M super pixels which have n peaks in the smoothed gray level histogram into the candidate text region, preferably, n is 2, and constructing a two-dimensional vector [ peak ₁,peak₂ ] by using gray values corresponding to the peaks, wherein the super pixels conforming to the conditions are described, namely, the gray values corresponding to the peaks are used as the characteristics of the super pixels forming the candidate text region.

Step 5: based on an AP (Affinity Propagation, neighbor propagation) algorithm, classifying the super pixels constituting the candidate text region: calculating the similarity between the super pixels constituting the candidate text region based on the characteristics of the super pixels of the candidate text region; setting iteration times, and iteratively calculating an attraction degree matrix and a attribution degree matrix based on the similarity; calculating a decision matrix based on the attraction degree matrix and the attribution degree matrix, and dividing the super pixels into a plurality of classes; specific:

The AP algorithm is used for adaptively classifying M super pixels forming the candidate text region, each class of super pixels is formed by similar super pixels to form a cluster, the candidate text region can be further reduced to one class, and the algorithm flow is as follows:

(501) The similarity between the super pixels is calculated by using the two-dimensional vector [ peak ₁,peak₂ ], and a similarity matrix S is constructed, and the calculation formula is as follows:

Wherein S (i, k) is the similarity between the ith superpixel and the kth superpixel, and [ peak _1i,peak_2i]、[peak_1k,peak_2k ] is the 2-dimensional vector corresponding to the ith superpixel and the kth superpixel respectively; the main diagonal element S (i, i) of the matrix S represents the possibility that the ith super pixel is taken as a clustering center and is set as the element mean value of the matrix;

(502) Calculating an attraction degree matrix R and a attribution degree matrix A:

The attraction degree R (i, k) represents the supporting degree of the ith super pixel to the kth super pixel as a clustering center, and the calculation formula is as follows:

Wherein, during the first operation, the attribution degree matrix A is set as a zero matrix;

The attribution degree a (i, k) represents the suitability of the kth superpixel as the cluster center of the ith superpixel, and when i+.k, the calculation formula of the attribution degree is as follows:

When i=k, the calculation formula of the attribution degree is as follows:

(503) Updating the attraction degree matrix R and the attribution degree matrix A in the following way:

R_t+1:＝(1-λ)·R_t+1+λ·R_t

A_t+1:＝(1-λ)·A_t+1+λ·A_t

wherein, R _t+1、A_t+1 is the attraction matrix and the attribution matrix of the (t+1) th iteration respectively, and R _t、A_t is the attraction matrix and the attribution matrix of the (t) th iteration respectively; λ is a damping coefficient, and is used for balancing the results of two adjacent iterations, and is set to 0.5; : the symbol left R _t+1 is updated and the symbol right R _t+1 is pre-updated.

(504) Repeating the operations of the steps (502) and (503), setting the iteration times according to the data volume, and preferably, performing 30 iterations;

(505) Let decision matrix e=r ₃₀+A₃₀, if the maximum value of the i-th row of decision matrix E is E (i, k), it means that the i-th super-pixel belongs to the cluster centered on the k-th super-pixel; m superpixels are divided into q clusters by traversing each row of elements of matrix E, where q.ltoreq.M.

Step 6: character area discrimination based on corner detection: and (3) carrying out corner detection, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text region in the video picture.

The method comprises the steps of using a Harris operator to detect corner points of q clusters in an image F _gray, selecting the cluster with the largest number of the corner points as a text area in a video picture, and detecting the corner points as follows:

(601) The gradient values F _x and F _y of each pixel in the gray image F _gray in the horizontal and vertical directions are calculated, weighted calculation is performed on the gradient values by using a gaussian function ω (x, y), and an autocorrelation matrix M is constructed, and the calculation formula is as follows:

(602) Based on the autocorrelation matrix, the corner response value R of each pixel in the gray image F _gray is calculated, and the calculation formula is as follows:

R＝detM-β(traceM)²

wherein detM is the determinant of matrix M, traceM is the trace of matrix M; beta is an empirical parameter, set to 0.05;

Comparing the corner response value of each pixel with a corner discrimination threshold value, and selecting candidate corners, specifically: setting a corner judgment threshold T ₂, and judging that the target pixel is a candidate corner if R > T ₂;

(603) And carrying out local non-maximum suppression on the candidate corner points to obtain corner points in the image F _gray.

And respectively counting the number of corner points contained in each q clusters, and selecting the cluster with the largest number of the corner points as a text region in the video picture.

The invention can rapidly and accurately position the text region of the polling video picture by reasonably applying the clustering and corner detection algorithm, can reduce the complexity of calculation and improves the applicability.

Example two

As shown in fig. 2, the present embodiment provides a polling video text positioning system, which specifically includes the following modules:

The device comprises a video key frame extraction module, an image preprocessing module, a super-pixel segmentation module, a super-pixel screening module, a super-pixel classification module and a text region discrimination module; the video key frame extraction module is connected with the image preprocessing module, the output end of the image preprocessing module is connected to the super-pixel segmentation module, the output end of the super-pixel segmentation module is connected to the super-pixel screening module, the output end of the super-pixel screening module is connected to the super-pixel classification module, and the output end of the super-pixel classification module is connected to the text region discrimination module.

The method comprises the steps of obtaining a key frame image through a video key frame extraction module, preprocessing the key frame image through an image preprocessing module, segmenting the image through a super-pixel segmentation module based on an SLIC algorithm, obtaining a candidate text region through a super-pixel screening module based on a gray histogram, dividing the candidate text region through a super-pixel classification module based on an AP algorithm, and finally positioning the text region through a text region judging module based on corner detection, which is significant for improving text positioning speed and accuracy in a polling video.

The invention can rapidly and accurately locate the text region of the polling video picture by reasonably applying the clustering and corner detection algorithm. The complexity of the system can be reduced, and the applicability of the system is improved.

It should be noted that, each module in the embodiment corresponds to each step in the first embodiment one to one, and the implementation process is the same, which is not described here.

Example III

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a polling videotext positioning method as described in the above embodiment one.

Example IV

The present embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps in a polling video text positioning method according to the above embodiment.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disc, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for locating a polling video text, comprising:

Acquiring a polling video and extracting a polling video key frame image;

Preprocessing a key frame image to obtain an LAB image and a gray level image;

dividing pixel points in the LAB image into a plurality of super pixels;

classifying the super pixels forming the candidate text region;

Performing corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in a video picture;

The specific steps of dividing the pixel points in the LAB image into a plurality of super pixels are as follows: dividing pixel points in the LAB image into a plurality of initial super pixels with the same size; calculating a clustering center of each super pixel; searching a cluster center closest to a cluster center in a neighborhood of a pixel point for the pixel point in the LAB image, taking a super pixel corresponding to the cluster center as a new super pixel to which the pixel point belongs, traversing all pixels in the LAB image, and updating the super pixel to which each pixel point belongs; judging whether iteration is finished or not, if not, returning to calculate a new cluster center of the super pixel; otherwise, the super-pixels with the number smaller than the threshold value are merged into other super-pixels which are overlapped with the boundary most, and the super-pixels of each pixel in the LAB image are updated;

The specific steps of screening the super pixels forming the candidate text area are as follows: calculating a corresponding gray level histogram of each super pixel in the gray level image, and taking the frequency average value of a plurality of adjacent gray levels in the histogram as the frequency of the middle gray level to obtain a smoothed gray level histogram; and taking the super-pixel with n peaks in the smoothed gray level histogram as the super-pixel constituting the candidate character area, and taking the gray level value corresponding to the peak as the characteristic of the super-pixel constituting the candidate character area.

2. The polling video text positioning method according to claim 1, wherein the specific steps of extracting the polling video key frame image are as follows:

3. The polling video text positioning method of claim 1, wherein the preprocessing comprises:

4. The method for locating polling video text according to claim 1, wherein the specific steps of classification are:

5. The polling video text positioning method as claimed in claim 1, wherein the specific steps of corner detection are as follows:

6. A polling video text positioning system, comprising:

A text region discrimination module configured to: performing corner detection on the gray level image, counting the number of corner points contained in each class, and selecting the class with the largest number of corner points as a text area in a video picture;

7. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of a method of polling video text locations as claimed in any one of claims 1-5.

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of a method of locating a polled videotext as claimed in any of claims 1 to 5 when the program is executed.