CN108038458A

CN108038458A - Outdoor Scene text automatic obtaining method in the video of feature based summary figure

Info

Publication number: CN108038458A
Application number: CN201711381971.5A
Authority: CN
Inventors: 黄晓冬; 王勤
Original assignee: Capital Normal University
Current assignee: Capital Normal University
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2018-05-15
Anticipated expiration: 2037-12-20
Also published as: CN108038458B

Abstract

A kind of Outdoor Scene text automatic obtaining method in video of feature based summary figure, obtains the video frame images of scene text, and the generation video frame feature extraction figure of the rgb color space based on video frame images first：First extraction is horizontal respectively on rgb color space, vertical, 45 degree and 135 degree four directions four trellis diagrams, obtain the four direction feature vector of characterization color space, and then obtain ten notable figures for representing different directions video frame and carry out fusion calculation, obtain video frame feature extraction figure.It is then based on video frame feature extraction figure and rgb color space carries out K averages color cluster and calculates, after obtaining the four class results in four regions of expression background, prospect alphabetic character, character outline and noise, analyze the connected domain of four class results respectively again, two regions of background and noise are deleted, obtain final Outdoor Scene text automatically.Operating procedure of the present invention is simple, calculates easy, the Outdoor Scene text that can be identified and obtain in real time, and popularizing application prospect is good.

Description

Outdoor Scene text automatic obtaining method in the video of feature based summary figure

Technical field

The present invention relates to a kind of digital image processing method, exactly, is related to a kind of video of feature based summary figure In Outdoor Scene text automatic obtaining method, belong to computer vision processing technical field.

Background technology

It is general with digital image acquisition device, smart mobile phone and practical vision system and its equipment in past several years And the image understanding technology based on content is got growing concern for.Compare because the scene text in image/video has Abundant, direct semantic information clue, therefore, scene text are considered as the important object for having to be detected and identify.Its In, text detection, positioning, extraction and identification are to obtain the key step of text message.Typically by text detection, position and carry The operation taken is collectively referred to as text acquisition.For text identification, text obtains and is very important premise, because it is reduced Complex background, eliminates illuminating effect, so that identification is relatively easy and easy.It is however, uneven due to indoor and outdoor Illumination, image/video it is smudgy, background is complicated, perspective distortion, color diversity, font is complicated and stroke width not Same etc. a variety of unfavorable factors, all acquisition to video scene text produce very big challenge and sternness.

At present, researcher both domestic and external has succeeded in developing a variety of sides in terms of the acquiring technology of video scene text Method.Now, the extraction of scene text is divided into two steps：(1) the detection positioning of scene text, the extraction of (2) scene text.

The scene text detection localization method of the prior art can be divided into：It is based on color, based on edge/gradient, be based on Texture and based on stroke four kinds of different scene text detection methods.Wherein：

Scene text detection based on color：This is a kind of conventional method already proposed and used for more than 20 years, should Method is simple and efficient：Scene text detection algorithm of the generally use based on local threshold, also has researcher using improvement Buddhist nun Local threshold acquisition methods in cloth clarke Niblack algorithms so that this method can be used in the fairly simple field of some backgrounds Scape text is used for quickly detecting.Researcher also propose using average and variance (mean shift) algorithm generation color layers, so as to Significantly improve the robustness of the text detection under complex background.But there are the character and light of multiple color in video/image According to it is uneven when, the text detection based on color characteristic can run into many problems.

Scene text detection based on edge/gradient：Assuming that the text filed appearance that is shown on background area it is strong and During symmetrical change, the pixel with big, symmetrical Grad can be considered as text pixel, this can by edge feature and Gradient Features are used in scene text detection.Researcher also proposes a kind of scene text detection algorithm based on edge enhancing. This kind of research includes the space limitation based on size, position and color distance, horizontally arranged " gradient vector flow " is passed through poly- Class mode finds text candidates region.Currently, researcher proposes gradient/edge feature and various graders are (such as artificial Neutral net or AdaBoost algorithms) the scene text detection algorithm based on AdaBoost graders that is combined；Even into one Step is proposed on the basis of based on AdaBoost graders, is further added by a kind of detection side of the String localization device based on neutral net Method.But this kind of algorithm is difficult to the scene text under complex background of the detection with strong gradient.

Scene text detection based on texture：When character zone is than comparatively dense, scene text can be considered as a kind of texture. Current many methods all detect scene text using texture feature extraction, including using Fourier transform, discrete cosine transform DCT (Discrete Cosine Transform), small echo, local binary patterns LBP (Local Binary Pattern) and side To histogram of gradients HOG (Histogram of Oriented Gradient) etc..Although textural characteristics can be used for effectively Intensive character is detected, but this method possibly can not detect sparse character.Then, researcher proposes to be based in Fu respectively Leaf frequency domain character detects scene text and the method based on the DCT coefficient detection scene text in frequency domain.Recently propose again a kind of Scene text algorithm is detected based on local binary patterns (Local Haar Binary Pattern) feature.However, when presentation When background is complicated, many background noises also all show the texture similar to text, and this reduces the detection essence of this method Degree.

Scene text detection method based on stroke：Stroke width converts SWT (Stroke Width Transform) quilt For calculating most possible stroke pixel wide.Feature based on stroke has been demonstrated that high score can be very effectively applied to The detection of resolution scene text, particularly when it combines appropriate learning method or stroke feature is poor with including edge direction EOV (edge orientation variance), opposite edge are to OEPs (opposite edge pairs) or space-sequential When the further feature of analysis (spatial-temporal analysis) mutually merges.Recently, the side based on Bandlet is introduced Edge detector improves the Edge difference of SWT, enhanced scene text, and the point edge that abates the noise so that SWT can be used for low In the detection of resolution ratio word.However, in the scene text of character of the detection with sizes and font, the inspection of this method Surveying precision significantly can significantly decline.

The scene text extracting method of the prior art can at least be divided into：It is based on threshold value, based on color and based on word Accord with three kinds of Text Feature Extraction algorithms of stroke.Wherein：

Text Feature Extraction algorithm based on threshold value：This method is divided into two subclass algorithms again：First, global threshold method is used, Such as Otsu algorithm (Otsu)；Another is to use local threshold method.It is proposed a kind of multi thresholds algorithm again now：In the algorithm Second stage threshold value depends on the threshold basis of first stage, so significantly enhances extraction effect.But because it is based on threshold value Method without considering the feature of scene text, do not have to obtain in this way and gratifying perform and promote.

Text Feature Extraction algorithm based on color：This method is first to generate several candidates using k averages or other clustering algorithms Binary picture, is then based on graphical analysis selection binary picture.Its main feature is that assume that textcolor is consistent, and by face Color cluster introduces the extraction of scene text.Shortcoming is：It is more sensitive to non-uniform lighting because it belongs to global calculation method, and The selection of calculating cost and parameter k when analyzing multiple candidate images, is all extremely complex.

Text Feature Extraction algorithm based on stroke：First with the texture in two groups of asymmetric Gabor filter extraction images Direction and scale, then by these features be used for most probable represent text character edge, to strengthen contrast.However, the algorithm It is very sensitive to the character boundary of extraction, be not suitable for extracting scene text in video.

In short, the detection of the scene text of the above-mentioned various prior arts and the extractive technique of positioning and scene text are in the presence of more How the unsatisfactory part of aspect, therefore, develop the scene text in the video that a kind of performance is more excellent or feature is perfect Acquisition methods, just become the new problem that scientific and technical personnel pay special attention in the industry.

The content of the invention

In view of this, the object of the present invention is to provide the Outdoor Scene text in a kind of video of feature based summary figure certainly Dynamic acquisition methods, this method can preferably solve number of drawbacks of the prior art, can correctly, intactly obtain in inequality Under even illumination, fuzzy or complicated background, there are perspective distortion, color is various, font is complicated and stroke width not etc. various Scene text under different situations..

In order to achieve the above object, the present invention provides the Outdoor Scene text in a kind of video of feature based summary figure Automatic obtaining method, it is characterised in that：This method includes following operative step：

Step 1, the video frame images of scene text, and the RGB rgb color space based on the video frame images are obtained Generate video frame feature extraction figure：Extraction includes horizontal direction, vertical direction, 45 degree of sides respectively first on rgb color space To four trellis diagrams with 135 degree of directions, the four direction feature vector for characterizing rgb color space is obtained；Again by this four A direction character vector carries out product calculation vectorial two-by-two respectively, and represent different directions video frame respectively with acquisition ten show Write figure；Then fusion calculation is carried out to ten notable figures of the different directions, obtains video frame feature extraction figure, as subsequently obtaining The visual characteristic of the scene text in video is taken, and deletes background and noise jamming, improves identification precision；

Step 2, scene text is obtained automatically：It is primarily based on the video frame feature extraction figure and rgb color space progress K is equal It is worth color cluster to calculate, which is subdivided into and represents background, prospect alphabetic character, character outline and noise respectively Four regions four class results；Carry out connected domain analysis respectively to the four classes result again, delete two regions of background and noise, Obtain final scene text.

At present, in the case where background is complicated and illumination is changeable, the acquisition of outdoor video scene text is extremely difficult.This hair The bright a kind of automatic method for obtaining the Outdoor Scene text in video as innovation, its key problem in technology is to propose how to obtain A kind of brand-new video frame feature extraction figure, as the visual characteristic that the scene text in video obtains automatically and basis, this hair Bright method can delete the interference of the background and noise in video well, significantly improve the essence of detection and the extraction of scene text Accuracy and integrity degree；Use at the same time and be based on video frame feature extraction figure and colourity, saturation degree, lightness HSV (Hue Saturation Value color space) carries out K average color clusters, then performs connected domain based on stroke width respectively and based on several The analyzing and processing of the connected domain of what shape, after deleting background area and noise region, it becomes possible to quickly and automatically obtain finally Video scene text.

Being tested by multiple simulation implementation proves, the present invention preferably solves the defects of prior art, can be at family It is various that outer video is in background complexity, perspective distortion, color, and uneven illumination is even or strong variations and font is complicated and stroke Under the different environment of width, scene text still quickly and correctly can be detected and extract automatically, moreover, this method operation step It is rapid it is fairly simple, computation complexity is low, easy to implement, can adapt to the demand of Outdoor Scene text for identifying and obtaining in real time, Therefore, the present invention has good popularizing application prospect.

Brief description of the drawings

Fig. 1 is the operating procedure of the Outdoor Scene text automatic obtaining method in the video of feature based summary figure of the present invention Flow chart.

Fig. 2 is the step 1 operating procedure flow chart of Outdoor Scene text automatic obtaining method of the present invention.

Fig. 3 is the step 2 operating procedure flow chart of Outdoor Scene text automatic obtaining method of the present invention.

Fig. 4 (A), (B), (C) are the original image of Outdoor Scene text automatic obtaining method embodiment of the present invention respectively, regard Three step schematic diagrams of frequency frame feature extraction figure and the scene text embodiment obtained.

Embodiment

To make the object, technical solutions and advantages of the present invention clearer, with reference to the accompanying drawings and examples to the present invention It is described in further detail.

Referring to Fig. 1, Outdoor Scene text automatic obtaining method in the video of feature based summary figure of the present invention is introduced Concrete operation step：

Step 1, the video frame images of scene text, and the RGB rgb color space based on the video frame images are obtained Generate video frame feature extraction figure：Extraction includes horizontal direction, vertical direction, 45 degree of sides respectively first on rgb color space To four trellis diagrams with 135 degree of directions, the four direction feature vector for characterizing rgb color space is obtained；Again by this four A direction character vector carries out product calculation vectorial two-by-two respectively, and represent different directions video frame respectively with acquisition ten show Write figure；Then fusion calculation is carried out to ten notable figures of the different directions, obtains video frame feature extraction figure, as subsequently obtaining The visual characteristic of the scene text in video is taken, and deletes background and noise jamming, improves identification precision.

Including for wherein being extracted respectively on rgb color space be horizontal, four vertical, 45 degree and 135 degree four directions In trellis diagram, the horizontal direction convolution kernel that horizontal trellis diagram uses is calculating horizontal direction differential in Sobel Sobel operators Calculation template：The vertical direction convolution kernel that vertical trellis diagram uses is that calculating vertical direction is micro- in Sobel operators The calculation template divided：45 degree of direction convolution kernels that 45 degree of trellis diagrams use are to calculate 45 degree of directional differentials Calculation template：135 degree of direction convolution kernels that 135 degree of trellis diagrams use are the meters for calculating 135 degree of directional differentials Calculate template：The trellis diagram feature extracting method feature based on convolution kernel that the present invention uses be algorithm it is simple, Arithmetic speed is fast, is conducive to Project Realization, and the illumination variation that the convolution feature extracted is not readily susceptible in Outdoor Scene influences.

Color cluster in the K mean cluster algorithm that the present invention uses calculate be video frequency abstract figure, colourity, saturation degree and In the space-time of lightness, four-dimensional cluster is carried out according to the included angle cosine of each pixel and four cluster centre points distance respectively Calculate what is realized.Because the illumination variation outdoors in environment is violent so that difference is presented in each alphabetic character in video frame Color, seriously affects the stroke integrality of extraction；Therefore clustered, also, be different from common using the above-mentioned four-dimension Euclidean distance, using do not focus on numerical value difference itself included angle cosine distance function calculate, can significantly reduce because of outdoor environment Influence of the illumination variation to character color.

Referring to Fig. 2, the step 1 concrete operations content in above-mentioned two operating procedure is first described in detail：

(11) horizontal direction trellis diagram R is first extracted respectively on red channel respectively_h, vertical direction trellis diagram R_v, 45 degree of sides To trellis diagram R_lWith 135 degree of direction trellis diagram R_r, extract horizontal direction trellis diagram G respectively on green channel_h, vertical direction volume Product figure G_v, 45 degree of direction trellis diagram G_lWith 135 degree of direction trellis diagram G_r, extract horizontal direction trellis diagram respectively on blue channel B_h, vertical direction trellis diagram B_v, 45 degree of direction trellis diagram B_lWith 135 degree of direction trellis diagram B_r；Again by above-mentioned all directions trellis diagram Arranged according to rgb color space, obtain the four direction feature vector for characterizing rgb color space：Horizontal direction feature to Measure H={ R_h,G_h,B_h, characteristic vector V={ R_v,G_v,B_v, 45 degree of direction character vector L={ R_l,G_l,B_l, 135 Spend direction character vector R={ R_r G_r,B_r}。

(12) the four direction feature vector is carried out to product calculation vectorial two-by-two respectively, acquisition represents video frame not Equidirectional ten notable figures, so as to while multiple direction initialization edge features are retained, delete the background in remaining direction and Noise jamming, and the stroke feature in a variety of directions of scene text is obtained, help to automatically extract scene text.The step (12) is again It is subdivided into following operation content：

(120) according to formula S_hh={ R_h,G_h,B_h}×{R_h,G_h,B_hCalculated level direction character vector involution product, Obtain horizontal direction notable figure S_hh, for retaining and strengthening the edge feature of horizontal direction, and weaken other direction edge features.

(121) according to formula S_vv={ R_v,G_v,B_v}×{R_v,G_v,B_vCalculate characteristic vector involution product, Obtain vertical direction notable figure S_vv, for retaining and strengthening the edge feature of vertical direction, and weaken other direction edge features.

(122) according to formula S_ll={ R_l,G_l,B_l}×{R_l,G_l,B_lCalculate 45 degree of direction character vectors involution product, Obtain 45 degree of direction notable figure S_ll, for retaining and strengthening the edge feature in 45 degree of directions, and weaken other direction edge features.

(123) according to formula S_rr={ R_r,G_r,B_r}×{R_r,G_r,B_rCalculate 135 degree of direction character vectors involution product, Obtain 135 degree of direction notable figure S_rr, for retaining and strengthening the edge feature in 135 degree of directions, and weaken other direction edges spy Sign.

(124) according to formula S_hv={ R_h,G_h,B_h}×{R_v,G_v,B_vCalculated level and vertical both direction feature vector The product of multiplication, obtains horizontal vertical direction notable figure S_hv, for retaining and strengthening the edge feature of horizontal vertical direction, and weaken Other direction edge features.

(125) according to formula S_hl={ R_h,G_h,B_h}×{R_l,G_l,B_lCalculated level and 45 degree of both direction feature vectors The product of multiplication, obtains 45 degree of direction notable figure S of level_hl, for retaining and strengthening the edge feature in 45 degree of directions of level, and weaken Other direction edge features.

(126) according to formula S_hr={ R_h,G_h,B_h}×{R_r,G_r,B_rCalculated level and 135 degree of both direction feature vectors Multiplication product, obtain 135 degree of direction notable figure S of level_hr, for retaining and strengthening the edge feature in 135 degree of directions of level, and Weaken other direction edge features.

(127) according to formula S_vl={ R_v,G_v,B_v}×{R_l,G_l,B_lCalculate vertically with 45 degree of both direction feature vectors The product of multiplication, obtains vertical 45 degree of directions notable figure S_vl, for retaining and strengthening the edge feature in vertical 45 degree of directions, and weaken Other direction edge features.

(128) according to formula S_vr={ R_v,G_v,B_v}×{R_r,G_r,B_rCalculate vertically with 135 degree of both direction feature vectors Multiplication product, obtain vertical 135 degree of directions notable figure S_vr, for retaining and strengthening the edge feature in vertical 135 degree of directions, and Weaken other direction edge features.

(129) according to formula S_lr={ R_l,G_l,B_l}×{R_r,G_r,B_rCalculate 45 degree and 135 degree of both direction feature vectors Multiplication product, obtain 45 degree of 135 degree of direction notable figure S_lr, for retaining and strengthening the edge feature in 45 degree of 135 degree of directions, and Weaken other direction edge features.

(13) fusion calculation is carried out to ten notable figures of the different directions, obtains video frame feature extraction figure, be follow-up The scene text obtained in video provides visual characteristic, and deletes background and noise jamming, improves scene text and obtains knot automatically The precision and integrity degree of fruit.

In the step (13), the operation content of fusion calculation is carried out to ten notable figures of different directions：Based on step (12) ten notable figures of the different directions of extraction, carry out respectively same coordinate pixel in wherein each image maximum, The correspondence computing of minimum value and average value, and by after operation result superposition, obtain final video frame summary figure fsg.Below Specifically introduce the following operation content of the step (13)：

(131) choose each pixel minimum in ten notable figures of the different directions positioned at same coordinate and carry out fusion meter Calculate, form minimal characteristic notable figure S_min(x, y)=min (p_i(x, y)), in formula, p_i(x, y) be each notable figure coordinate (x, Y) pixel value, subscript i are notable figure classifications, and i ∈ { S_hh,S_vv,S_ll,S_rr,S_hv,S_hl,S_hr,S_vl,S_vr,S_lr, function min is Extract pixel p_iThe oeprator of (x, y) minimum value.

(132) maximum for choosing each pixel in ten notable figures of the different directions positioned at same coordinate is merged Calculate, form maximum characteristic remarkable picture S_max(x, y)=max (p_i(x, y)), in formula, function max is extraction pixel p_i(x, y) most The oeprator being worth greatly.

(133) average value for choosing each pixel in ten notable figures of the different directions positioned at same coordinate is merged Calculate, form average characteristics notable figure S_mean(x, y)=mean (p_i(x, y)), in formula, function mean is extraction same position picture Plain p_iThe oeprator of (x, y) average value.

(134) in order to make video frame summary figure try one's best edge feature integrality of the reserved character in each direction, subtract at the same time The illumination variation easily occurred in few outdoor environment video influences, and can keep setting different directions edge feature using above-mentioned Minimum, maximum and average three kinds of characteristic remarkable pictures, fusion calculation is carried out according to formula：Obtain final video frame feature extraction figure fsg.

Fig. 3 is referred again to, describes the step 2 concrete operations content in above-mentioned two operating procedure in detail：

(21) based on K mean cluster algorithm to the colourity of video frame feature extraction figure, saturation degree, lightness color space HSV (Hue Saturation Value) carries out color cluster calculating：The video frame feature extraction figure is divided into and is represented respectively Background, prospect alphabetic character, the four class K average color cluster results in four regions of character outline and noise.

(22) Connected area disposal$ based on stroke width：To above-mentioned four classes K averages color cluster as a result, calculating respectively each The edge pixel stroke width of connected domain, then each connected domain is analyzed based on stroke width, delete background area and noise region. In the step (22), including following operation content.

(221) based on ten notable figures in step 1, the gradient direction angle of each pixel in video frame summary figure is calculated θ：

(222) because character zone does not appear in video image border or is connected with video image border, therefore delete The connected domain being connected in video frequency abstract figure with the border of surrounding up and down of image；

(223) boundary pixel of each connected domain is obtained, then each boundary pixel is searched forward according to its gradient direction angle θ Pixel value in two boundary pixels, when finding another boundary pixel, is arranged to the stroke of two pixels by rope Width.

(224) stroke width of all boundary pixels of same connected domain is first calculated, then calculates all boundary pixels The variance of stroke width；If the result value for calculating variance is less than 0.5, then it is assumed that the boundary pixel stroke width of the connected domain Value is retained as the character zone of candidate (foundation so operated is because of western language and Chinese character close to actual numerical value Region aspect ratio is more constant setting numerical value, i.e. the stroke width value of western language or Chinese is similar)；Otherwise, it is right The bigger region of length and width in connected domain, is considered as being not belonging to character and deleting it.

(23) for the smaller noise region still left after step (22) processing, the connected domain based on geometry is performed Processing：The size of each connected domain, the pixel quantity that i.e. connected domain is included in calculating character image respectively, delete it In be considered as the less connected domain of ratio in noise region, to improve objective image quality.The concrete operations side of Connected area disposal$ Method is to calculate its main axis length, if main axis length is more than 1/3rd of video frame feature extraction figure picture traverse or less than ten / mono-, it is too big or too small to be considered as the connected domain, is not belonging to character zone and deletes it.

(24) scene text region is obtained：All connected domains in four cluster results are analyzed, all clusters are retained Final connected domain merge into an image, the distance and stroke width two according still further to each connected domain are estimated, will similar in Connected domain is determined as the same area, so as to obtain final video scene text.

The method of the present invention have been carried out Multi simulation running implement experiment, experiment the result is that successful.Referring to (A) of Fig. 4, (B), (C), three figure are that the video frame feature that original video frame, step 1 in one embodiment of the method for the present invention obtain is plucked respectively Scheme the operating result with step 2：The example schematic of Outdoor Scene text in the video video of acquisition.It is that is, defeated Enter for the video frame containing scene text, after the processing of the method for the present invention, export the full scene text for acquisition, can For subsequent scenario text identification.Therefore, the present invention can realize goal of the invention well, have before promoting and applying well Scape.

Claims

A kind of 1. Outdoor Scene text automatic obtaining method in video of feature based summary figure, it is characterised in that：This method Including following operative step：

Step 1, the video frame images of scene text, and the generation of the RGB rgb color space based on the video frame images are obtained Video frame feature extraction figure：First on rgb color space respectively extraction include horizontal direction, vertical direction, 45 degree of directions and Four trellis diagrams in 135 degree of directions, obtain the four direction feature vector for characterizing rgb color space；Again by four sides Carry out two-by-two vectorial product calculation respectively to feature vector, ten of different directions video frame are represented respectively significantly to obtain Figure；Then fusion calculation is carried out to ten notable figures of the different directions, obtains video frame feature extraction figure, obtained as follow-up The visual characteristic of scene text in video, and background and noise jamming are deleted, improve identification precision；

Step 2, scene text is obtained automatically：It is primarily based on the video frame feature extraction figure and rgb color space carries out K average face Color cluster calculation, which is subdivided into and represents the four of background, prospect alphabetic character, character outline and noise respectively The four class results in a region；Carry out connected domain analysis respectively to the four classes result again, delete two regions of background and noise, obtain Final scene text.
2. according to the method described in claim 1, it is characterized in that：It is described extracted respectively on rgb color space include water Square into, vertical direction, 45 degree of directions and four trellis diagrams in 135 degree of directions, level side that horizontal direction trellis diagram uses It is the calculation template that horizontal direction differential is calculated in Sobel Sobel operators to convolution kernel：Vertical direction convolution The vertical direction convolution kernel that figure uses is the calculation template that vertical direction differential is calculated in Sobel operators：45 45 degree of direction convolution kernels that degree direction trellis diagram uses are the calculation templates for calculating 45 degree of directional differentials：135 135 degree of direction convolution kernels that degree direction trellis diagram uses are the calculation templates for calculating 135 degree of directional differentials： Trellis diagram feature extracting method feature based on convolution kernel is that algorithm is simple, arithmetic speed is fast, is conducive to Project Realization, and is extracted Convolution feature be not readily susceptible to illumination variation in Outdoor Scene and influence.
3. according to the method described in claim 1, it is characterized in that：The step 1 includes following operation content：

(11) horizontal direction trellis diagram R is first extracted respectively on red channel respectively_h, vertical direction trellis diagram R_v, 45 degree of direction volumes Product figure R_lWith 135 degree of direction trellis diagram R_r, extract horizontal direction trellis diagram G respectively on green channel_h, vertical direction trellis diagram G_v, 45 degree of direction trellis diagram G_lWith 135 degree of direction trellis diagram G_r, extract horizontal direction trellis diagram B respectively on blue channel_h, hang down Nogata is to trellis diagram B_v, 45 degree of direction trellis diagram B_lWith 135 degree of direction trellis diagram B_r；Again by above-mentioned all directions trellis diagram according to Rgb color space arranges, and obtains the four direction feature vector for characterizing rgb color space：Horizontal direction feature vector H= {R_h,G_h,B_h, characteristic vector V={ R_v,G_v,B_v, 45 degree of direction character vector L={ R_l,G_l,B_l, 135 degree of directions Feature vector R={ R_r G_r,B_r}；

(12) the four direction feature vector is carried out to product calculation vectorial two-by-two respectively, obtains the not Tongfang for representing video frame To ten notable figures, while multiple direction initialization edge features are retained, to delete the background and noise in remaining direction Interference, and the stroke feature in a variety of directions of scene text is obtained, help to automatically extract scene text；

(13) fusion calculation is carried out to ten notable figures of the different directions, obtains video frame feature extraction figure, obtained to be follow-up Scene text in video provides visual characteristic, and deletes background and noise jamming, improves scene text and obtains result automatically Precision and integrity degree.
4. according to the method described in claim 3, it is characterized in that：The step (12) includes following operation content：

(120) according to formula S_hh={ R_h,G_h,B_h}×{R_h,G_h,B_hCalculated level direction character vector involution product, obtain Horizontal direction notable figure S_hh, for retaining and strengthening the edge feature of horizontal direction, and weaken other direction edge features；

(121) according to formula S_vv={ R_v,G_v,B_v}×{R_v,G_v,B_vCalculate characteristic vector involution product, obtain Vertical direction notable figure S_vv, for retaining and strengthening the edge feature of vertical direction, and weaken other direction edge features；

(122) according to formula S_ll={ R_l,G_l,B_l}×{R_l,G_l,B_lCalculate 45 degree of direction character vectors involution product, obtain 45 degree of direction notable figure S_ll, for retaining and strengthening the edge feature in 45 degree of directions, and weaken other direction edge features；

(123) according to formula S_rr={ R_r,G_r,B_r}×{R_r,G_r,B_rCalculate 135 degree of direction character vectors involution product, obtain 135 degree of direction notable figure S_rr, for retaining and strengthening the edge feature in 135 degree of directions, and weaken other direction edge features；

(124) according to formula S_hv={ R_h,G_h,B_h}×{R_v,G_v,B_vCalculated level is multiplied with vertical both direction feature vector Product, obtain horizontal vertical direction notable figure S_hv, for retaining and strengthening the edge feature of horizontal vertical direction, and weaken other Direction edge feature；

(125) according to formula S_hl={ R_h,G_h,B_h}×{R_l,G_l,B_lCalculated level is multiplied with 45 degree of both direction feature vectors Product, obtain 45 degree of direction notable figure S of level_hl, for retaining and strengthening the edge feature in 45 degree of directions of level, and weaken other Direction edge feature；

(126) according to formula S_hr={ R_h,G_h,B_h}×{R_r,G_r,B_rCalculated level and the phase of 135 degree of both direction feature vectors The product multiplied, obtains 135 degree of direction notable figure S of level_hr, for retaining and strengthening the edge feature in 135 degree of directions of level, and weaken Other direction edge features；

(127) according to formula S_vl={ R_v,G_v,B_v}×{R_l,G_l,B_lCalculate and to be vertically multiplied with 45 degree of both direction feature vectors Product, obtain vertical 45 degree of directions notable figure S_vl, for retaining and strengthening the edge feature in vertical 45 degree of directions, and weaken other Direction edge feature；

(128) according to formula S_vr={ R_v,G_v,B_v}×{R_r,G_r,B_rCalculate the vertically phase with 135 degree of both direction feature vectors The product multiplied, obtains vertical 135 degree of directions notable figure S_vr, for retaining and strengthening the edge feature in vertical 135 degree of directions, and weaken Other direction edge features；

(129) according to formula S_lr={ R_l,G_l,B_l}×{R_r,G_r,B_rCalculate 45 degree of phases with 135 degree of both direction feature vectors The product multiplied, obtains 45 degree of 135 degree of direction notable figure S_lr, for retaining and strengthening the edge feature in 45 degree of 135 degree of directions, and weaken Other direction edge features.
5. according to the method described in claim 3, it is characterized in that：It is notable to ten of different directions in the step (13) Figure carries out the operation content of fusion calculation：Ten notable figures of the different directions based on step (12) extraction, carry out wherein respectively Each correspondence computing of the maximum of the same coordinate pixel in image, minimum value and average value, and the operation result is superimposed Afterwards, final video frame summary figure fsg is obtained.
6. method according to claim 4 or 5, it is characterised in that：The step (13) includes following operation content：

(131) choose each pixel minimum in ten notable figures of the different directions positioned at same coordinate and carry out fusion calculation, Form minimal characteristic notable figure S_min(x, y)=min (p_i(x, y)), in formula, p_i(x, y) be each notable figure coordinate (x, y) as Element value, subscript i is notable figure classification, and i ∈ { S_hh,S_vv,S_ll,S_rr,S_hv,S_hl,S_hr,S_vl,S_vr,S_lr, function min is extraction Pixel p_iThe oeprator of (x, y) minimum value：

(132) maximum for choosing each pixel in ten notable figures of the different directions positioned at same coordinate carries out fusion meter Calculate, form maximum characteristic remarkable picture S_max(x, y)=max (p_i(x, y)), in formula, function max is extraction pixel p_i(x, y) is maximum The oeprator of value：

(133) average value for choosing each pixel in ten notable figures of the different directions positioned at same coordinate carries out fusion meter Calculate, form average characteristics notable figure S_mean(x, y)=mean (p_i(x, y)), in formula, function mean is extraction same position pixel p_iThe oeprator of (x, y) average value：

(134) in order to make video frame summary figure try one's best edge feature integrality of the reserved character in each direction, while family is reduced The illumination variation easily occurred in external environment video influences, and setting different directions edge feature can be kept most using above-mentioned Small, maximum and average three kinds of characteristic remarkable pictures, fusion calculation is carried out according to formula：Obtain final video frame feature extraction figure fsg.
7. according to the method described in claim 1, it is characterized in that：The step (2) includes following operation content：

(21) based on K mean cluster algorithm to the colourity of video frame feature extraction figure, saturation degree, lightness color space HSV (Hue Saturation Value) carries out color cluster calculating：The video frame feature extraction figure is divided into and represents the back of the body respectively Scape, prospect alphabetic character, the four class K average color cluster results in four regions of character outline and noise；

(22) Connected area disposal$ based on stroke width：To above-mentioned four classes K averages color cluster as a result, calculating each connection respectively The edge pixel stroke width in domain, then each connected domain is analyzed based on stroke width, delete background area and noise region；

(23) for the smaller noise region still left after step (22) processing, perform at the connected domain based on geometry Reason：The pixel quantity that each connected domain is included in calculating character image respectively, deletes the ratio for being wherein considered as noise region Less connected domain, to improve objective image quality；

(24) scene text region is obtained：All connected domains in four cluster results are analyzed, all clusters are retained most Whole connected domain merges into an image, and the distance and stroke width two according still further to each connected domain are estimated, by similar connection Domain is determined as the same area, so as to obtain final video scene text.
8. according to the method described in claim 6, it is characterized in that：Color cluster in the K mean cluster algorithm calculates In the space-time of video frequency abstract figure, colourity, saturation degree and lightness, respectively according to each pixel and four cluster centre points Included angle cosine distance carry out what four-dimensional cluster was realized；Because of the illumination variation in environment outdoors so that each alphabetic character exists Different colours are presented in video frame, seriously affect the stroke integrality of extraction；Therefore clustered using the above-mentioned four-dimension, and adopt Calculated with the included angle cosine distance function for not focusing on numerical value difference itself, the illumination variation pair because of outdoor environment can be significantly reduced The influence of character color.
9. according to the method described in claim 6, it is characterized in that：In the step (22), including following operation content；

(221) based on ten notable figures in step 1, the gradient direction angle θ of each pixel in video frame summary figure is calculated：

(222) because character zone does not appear in video image border or is connected with video image border, therefore video is deleted The connected domain being connected in summary figure with the border of surrounding up and down of image；

(223) boundary pixel of each connected domain is obtained, then each boundary pixel is searched for forward according to its gradient direction angle θ, When finding another boundary pixel, the pixel value in two boundary pixels is arranged to the stroke width of two pixels Degree；

(224) stroke width of all boundary pixels of same connected domain is first calculated, then calculates the stroke of all boundary pixels The variance of width；If the numerical value for calculating variance is less than 0.5, then it is assumed that the boundary pixel stroke width angle value of the connected domain is close in fact Border numerical value, and it is retained as the character zone of candidate；Otherwise it is assumed that it is not belonging to character and deletes it.
10. according to the method described in claim 8, it is characterized in that：Because the region aspect ratio of western language and Chinese character is setting Numerical value, i.e. the stroke width value of western language or Chinese are similar, therefore delete the bigger region of the length and width in connected domain；Institute It is to seek its main axis length to state connected domain, if main axis length is more than 1/3rd or small of video frame feature extraction figure picture traverse In 1/10th, it is considered as the connected domain and is not belonging to character zone and deletes it.