CN108038458A - Outdoor Scene text automatic obtaining method in the video of feature based summary figure - Google Patents

Outdoor Scene text automatic obtaining method in the video of feature based summary figure Download PDF

Info

Publication number
CN108038458A
CN108038458A CN201711381971.5A CN201711381971A CN108038458A CN 108038458 A CN108038458 A CN 108038458A CN 201711381971 A CN201711381971 A CN 201711381971A CN 108038458 A CN108038458 A CN 108038458A
Authority
CN
China
Prior art keywords
degree
feature
notable
video frame
directions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711381971.5A
Other languages
Chinese (zh)
Other versions
CN108038458B (en
Inventor
黄晓冬
王勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Original Assignee
Capital Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal University filed Critical Capital Normal University
Priority to CN201711381971.5A priority Critical patent/CN108038458B/en
Publication of CN108038458A publication Critical patent/CN108038458A/en
Application granted granted Critical
Publication of CN108038458B publication Critical patent/CN108038458B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Abstract

A kind of Outdoor Scene text automatic obtaining method in video of feature based summary figure, obtains the video frame images of scene text, and the generation video frame feature extraction figure of the rgb color space based on video frame images first:First extraction is horizontal respectively on rgb color space, vertical, 45 degree and 135 degree four directions four trellis diagrams, obtain the four direction feature vector of characterization color space, and then obtain ten notable figures for representing different directions video frame and carry out fusion calculation, obtain video frame feature extraction figure.It is then based on video frame feature extraction figure and rgb color space carries out K averages color cluster and calculates, after obtaining the four class results in four regions of expression background, prospect alphabetic character, character outline and noise, analyze the connected domain of four class results respectively again, two regions of background and noise are deleted, obtain final Outdoor Scene text automatically.Operating procedure of the present invention is simple, calculates easy, the Outdoor Scene text that can be identified and obtain in real time, and popularizing application prospect is good.

Description

Outdoor Scene text automatic obtaining method in the video of feature based summary figure
Technical field
The present invention relates to a kind of digital image processing method, exactly, is related to a kind of video of feature based summary figure In Outdoor Scene text automatic obtaining method, belong to computer vision processing technical field.
Background technology
It is general with digital image acquisition device, smart mobile phone and practical vision system and its equipment in past several years And the image understanding technology based on content is got growing concern for.Compare because the scene text in image/video has Abundant, direct semantic information clue, therefore, scene text are considered as the important object for having to be detected and identify.Its In, text detection, positioning, extraction and identification are to obtain the key step of text message.Typically by text detection, position and carry The operation taken is collectively referred to as text acquisition.For text identification, text obtains and is very important premise, because it is reduced Complex background, eliminates illuminating effect, so that identification is relatively easy and easy.It is however, uneven due to indoor and outdoor Illumination, image/video it is smudgy, background is complicated, perspective distortion, color diversity, font is complicated and stroke width not Same etc. a variety of unfavorable factors, all acquisition to video scene text produce very big challenge and sternness.
At present, researcher both domestic and external has succeeded in developing a variety of sides in terms of the acquiring technology of video scene text Method.Now, the extraction of scene text is divided into two steps:(1) the detection positioning of scene text, the extraction of (2) scene text.
The scene text detection localization method of the prior art can be divided into:It is based on color, based on edge/gradient, be based on Texture and based on stroke four kinds of different scene text detection methods.Wherein:
Scene text detection based on color:This is a kind of conventional method already proposed and used for more than 20 years, should Method is simple and efficient:Scene text detection algorithm of the generally use based on local threshold, also has researcher using improvement Buddhist nun Local threshold acquisition methods in cloth clarke Niblack algorithms so that this method can be used in the fairly simple field of some backgrounds Scape text is used for quickly detecting.Researcher also propose using average and variance (mean shift) algorithm generation color layers, so as to Significantly improve the robustness of the text detection under complex background.But there are the character and light of multiple color in video/image According to it is uneven when, the text detection based on color characteristic can run into many problems.
Scene text detection based on edge/gradient:Assuming that the text filed appearance that is shown on background area it is strong and During symmetrical change, the pixel with big, symmetrical Grad can be considered as text pixel, this can by edge feature and Gradient Features are used in scene text detection.Researcher also proposes a kind of scene text detection algorithm based on edge enhancing. This kind of research includes the space limitation based on size, position and color distance, horizontally arranged " gradient vector flow " is passed through poly- Class mode finds text candidates region.Currently, researcher proposes gradient/edge feature and various graders are (such as artificial Neutral net or AdaBoost algorithms) the scene text detection algorithm based on AdaBoost graders that is combined;Even into one Step is proposed on the basis of based on AdaBoost graders, is further added by a kind of detection side of the String localization device based on neutral net Method.But this kind of algorithm is difficult to the scene text under complex background of the detection with strong gradient.
Scene text detection based on texture:When character zone is than comparatively dense, scene text can be considered as a kind of texture. Current many methods all detect scene text using texture feature extraction, including using Fourier transform, discrete cosine transform DCT (Discrete Cosine Transform), small echo, local binary patterns LBP (Local Binary Pattern) and side To histogram of gradients HOG (Histogram of Oriented Gradient) etc..Although textural characteristics can be used for effectively Intensive character is detected, but this method possibly can not detect sparse character.Then, researcher proposes to be based in Fu respectively Leaf frequency domain character detects scene text and the method based on the DCT coefficient detection scene text in frequency domain.Recently propose again a kind of Scene text algorithm is detected based on local binary patterns (Local Haar Binary Pattern) feature.However, when presentation When background is complicated, many background noises also all show the texture similar to text, and this reduces the detection essence of this method Degree.
Scene text detection method based on stroke:Stroke width converts SWT (Stroke Width Transform) quilt For calculating most possible stroke pixel wide.Feature based on stroke has been demonstrated that high score can be very effectively applied to The detection of resolution scene text, particularly when it combines appropriate learning method or stroke feature is poor with including edge direction EOV (edge orientation variance), opposite edge are to OEPs (opposite edge pairs) or space-sequential When the further feature of analysis (spatial-temporal analysis) mutually merges.Recently, the side based on Bandlet is introduced Edge detector improves the Edge difference of SWT, enhanced scene text, and the point edge that abates the noise so that SWT can be used for low In the detection of resolution ratio word.However, in the scene text of character of the detection with sizes and font, the inspection of this method Surveying precision significantly can significantly decline.
The scene text extracting method of the prior art can at least be divided into:It is based on threshold value, based on color and based on word Accord with three kinds of Text Feature Extraction algorithms of stroke.Wherein:
Text Feature Extraction algorithm based on threshold value:This method is divided into two subclass algorithms again:First, global threshold method is used, Such as Otsu algorithm (Otsu);Another is to use local threshold method.It is proposed a kind of multi thresholds algorithm again now:In the algorithm Second stage threshold value depends on the threshold basis of first stage, so significantly enhances extraction effect.But because it is based on threshold value Method without considering the feature of scene text, do not have to obtain in this way and gratifying perform and promote.
Text Feature Extraction algorithm based on color:This method is first to generate several candidates using k averages or other clustering algorithms Binary picture, is then based on graphical analysis selection binary picture.Its main feature is that assume that textcolor is consistent, and by face Color cluster introduces the extraction of scene text.Shortcoming is:It is more sensitive to non-uniform lighting because it belongs to global calculation method, and The selection of calculating cost and parameter k when analyzing multiple candidate images, is all extremely complex.
Text Feature Extraction algorithm based on stroke:First with the texture in two groups of asymmetric Gabor filter extraction images Direction and scale, then by these features be used for most probable represent text character edge, to strengthen contrast.However, the algorithm It is very sensitive to the character boundary of extraction, be not suitable for extracting scene text in video.
In short, the detection of the scene text of the above-mentioned various prior arts and the extractive technique of positioning and scene text are in the presence of more How the unsatisfactory part of aspect, therefore, develop the scene text in the video that a kind of performance is more excellent or feature is perfect Acquisition methods, just become the new problem that scientific and technical personnel pay special attention in the industry.
The content of the invention
In view of this, the object of the present invention is to provide the Outdoor Scene text in a kind of video of feature based summary figure certainly Dynamic acquisition methods, this method can preferably solve number of drawbacks of the prior art, can correctly, intactly obtain in inequality Under even illumination, fuzzy or complicated background, there are perspective distortion, color is various, font is complicated and stroke width not etc. various Scene text under different situations..
In order to achieve the above object, the present invention provides the Outdoor Scene text in a kind of video of feature based summary figure Automatic obtaining method, it is characterised in that:This method includes following operative step:
Step 1, the video frame images of scene text, and the RGB rgb color space based on the video frame images are obtained Generate video frame feature extraction figure:Extraction includes horizontal direction, vertical direction, 45 degree of sides respectively first on rgb color space To four trellis diagrams with 135 degree of directions, the four direction feature vector for characterizing rgb color space is obtained;Again by this four A direction character vector carries out product calculation vectorial two-by-two respectively, and represent different directions video frame respectively with acquisition ten show Write figure;Then fusion calculation is carried out to ten notable figures of the different directions, obtains video frame feature extraction figure, as subsequently obtaining The visual characteristic of the scene text in video is taken, and deletes background and noise jamming, improves identification precision;
Step 2, scene text is obtained automatically:It is primarily based on the video frame feature extraction figure and rgb color space progress K is equal It is worth color cluster to calculate, which is subdivided into and represents background, prospect alphabetic character, character outline and noise respectively Four regions four class results;Carry out connected domain analysis respectively to the four classes result again, delete two regions of background and noise, Obtain final scene text.
At present, in the case where background is complicated and illumination is changeable, the acquisition of outdoor video scene text is extremely difficult.This hair The bright a kind of automatic method for obtaining the Outdoor Scene text in video as innovation, its key problem in technology is to propose how to obtain A kind of brand-new video frame feature extraction figure, as the visual characteristic that the scene text in video obtains automatically and basis, this hair Bright method can delete the interference of the background and noise in video well, significantly improve the essence of detection and the extraction of scene text Accuracy and integrity degree;Use at the same time and be based on video frame feature extraction figure and colourity, saturation degree, lightness HSV (Hue Saturation Value color space) carries out K average color clusters, then performs connected domain based on stroke width respectively and based on several The analyzing and processing of the connected domain of what shape, after deleting background area and noise region, it becomes possible to quickly and automatically obtain finally Video scene text.
Being tested by multiple simulation implementation proves, the present invention preferably solves the defects of prior art, can be at family It is various that outer video is in background complexity, perspective distortion, color, and uneven illumination is even or strong variations and font is complicated and stroke Under the different environment of width, scene text still quickly and correctly can be detected and extract automatically, moreover, this method operation step It is rapid it is fairly simple, computation complexity is low, easy to implement, can adapt to the demand of Outdoor Scene text for identifying and obtaining in real time, Therefore, the present invention has good popularizing application prospect.
Brief description of the drawings
Fig. 1 is the operating procedure of the Outdoor Scene text automatic obtaining method in the video of feature based summary figure of the present invention Flow chart.
Fig. 2 is the step 1 operating procedure flow chart of Outdoor Scene text automatic obtaining method of the present invention.
Fig. 3 is the step 2 operating procedure flow chart of Outdoor Scene text automatic obtaining method of the present invention.
Fig. 4 (A), (B), (C) are the original image of Outdoor Scene text automatic obtaining method embodiment of the present invention respectively, regard Three step schematic diagrams of frequency frame feature extraction figure and the scene text embodiment obtained.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, with reference to the accompanying drawings and examples to the present invention It is described in further detail.
Referring to Fig. 1, Outdoor Scene text automatic obtaining method in the video of feature based summary figure of the present invention is introduced Concrete operation step:
Step 1, the video frame images of scene text, and the RGB rgb color space based on the video frame images are obtained Generate video frame feature extraction figure:Extraction includes horizontal direction, vertical direction, 45 degree of sides respectively first on rgb color space To four trellis diagrams with 135 degree of directions, the four direction feature vector for characterizing rgb color space is obtained;Again by this four A direction character vector carries out product calculation vectorial two-by-two respectively, and represent different directions video frame respectively with acquisition ten show Write figure;Then fusion calculation is carried out to ten notable figures of the different directions, obtains video frame feature extraction figure, as subsequently obtaining The visual characteristic of the scene text in video is taken, and deletes background and noise jamming, improves identification precision.
Including for wherein being extracted respectively on rgb color space be horizontal, four vertical, 45 degree and 135 degree four directions In trellis diagram, the horizontal direction convolution kernel that horizontal trellis diagram uses is calculating horizontal direction differential in Sobel Sobel operators Calculation template:The vertical direction convolution kernel that vertical trellis diagram uses is that calculating vertical direction is micro- in Sobel operators The calculation template divided:45 degree of direction convolution kernels that 45 degree of trellis diagrams use are to calculate 45 degree of directional differentials Calculation template:135 degree of direction convolution kernels that 135 degree of trellis diagrams use are the meters for calculating 135 degree of directional differentials Calculate template:The trellis diagram feature extracting method feature based on convolution kernel that the present invention uses be algorithm it is simple, Arithmetic speed is fast, is conducive to Project Realization, and the illumination variation that the convolution feature extracted is not readily susceptible in Outdoor Scene influences.
Step 2, scene text is obtained automatically:It is primarily based on the video frame feature extraction figure and rgb color space progress K is equal It is worth color cluster to calculate, which is subdivided into and represents background, prospect alphabetic character, character outline and noise respectively Four regions four class results;Carry out connected domain analysis respectively to the four classes result again, delete two regions of background and noise, Obtain final scene text.
Color cluster in the K mean cluster algorithm that the present invention uses calculate be video frequency abstract figure, colourity, saturation degree and In the space-time of lightness, four-dimensional cluster is carried out according to the included angle cosine of each pixel and four cluster centre points distance respectively Calculate what is realized.Because the illumination variation outdoors in environment is violent so that difference is presented in each alphabetic character in video frame Color, seriously affects the stroke integrality of extraction;Therefore clustered, also, be different from common using the above-mentioned four-dimension Euclidean distance, using do not focus on numerical value difference itself included angle cosine distance function calculate, can significantly reduce because of outdoor environment Influence of the illumination variation to character color.
Referring to Fig. 2, the step 1 concrete operations content in above-mentioned two operating procedure is first described in detail:
(11) horizontal direction trellis diagram R is first extracted respectively on red channel respectivelyh, vertical direction trellis diagram Rv, 45 degree of sides To trellis diagram RlWith 135 degree of direction trellis diagram Rr, extract horizontal direction trellis diagram G respectively on green channelh, vertical direction volume Product figure Gv, 45 degree of direction trellis diagram GlWith 135 degree of direction trellis diagram Gr, extract horizontal direction trellis diagram respectively on blue channel Bh, vertical direction trellis diagram Bv, 45 degree of direction trellis diagram BlWith 135 degree of direction trellis diagram Br;Again by above-mentioned all directions trellis diagram Arranged according to rgb color space, obtain the four direction feature vector for characterizing rgb color space:Horizontal direction feature to Measure H={ Rh,Gh,Bh, characteristic vector V={ Rv,Gv,Bv, 45 degree of direction character vector L={ Rl,Gl,Bl, 135 Spend direction character vector R={ Rr Gr,Br}。
(12) the four direction feature vector is carried out to product calculation vectorial two-by-two respectively, acquisition represents video frame not Equidirectional ten notable figures, so as to while multiple direction initialization edge features are retained, delete the background in remaining direction and Noise jamming, and the stroke feature in a variety of directions of scene text is obtained, help to automatically extract scene text.The step (12) is again It is subdivided into following operation content:
(120) according to formula Shh={ Rh,Gh,Bh}×{Rh,Gh,BhCalculated level direction character vector involution product, Obtain horizontal direction notable figure Shh, for retaining and strengthening the edge feature of horizontal direction, and weaken other direction edge features.
(121) according to formula Svv={ Rv,Gv,Bv}×{Rv,Gv,BvCalculate characteristic vector involution product, Obtain vertical direction notable figure Svv, for retaining and strengthening the edge feature of vertical direction, and weaken other direction edge features.
(122) according to formula Sll={ Rl,Gl,Bl}×{Rl,Gl,BlCalculate 45 degree of direction character vectors involution product, Obtain 45 degree of direction notable figure Sll, for retaining and strengthening the edge feature in 45 degree of directions, and weaken other direction edge features.
(123) according to formula Srr={ Rr,Gr,Br}×{Rr,Gr,BrCalculate 135 degree of direction character vectors involution product, Obtain 135 degree of direction notable figure Srr, for retaining and strengthening the edge feature in 135 degree of directions, and weaken other direction edges spy Sign.
(124) according to formula Shv={ Rh,Gh,Bh}×{Rv,Gv,BvCalculated level and vertical both direction feature vector The product of multiplication, obtains horizontal vertical direction notable figure Shv, for retaining and strengthening the edge feature of horizontal vertical direction, and weaken Other direction edge features.
(125) according to formula Shl={ Rh,Gh,Bh}×{Rl,Gl,BlCalculated level and 45 degree of both direction feature vectors The product of multiplication, obtains 45 degree of direction notable figure S of levelhl, for retaining and strengthening the edge feature in 45 degree of directions of level, and weaken Other direction edge features.
(126) according to formula Shr={ Rh,Gh,Bh}×{Rr,Gr,BrCalculated level and 135 degree of both direction feature vectors Multiplication product, obtain 135 degree of direction notable figure S of levelhr, for retaining and strengthening the edge feature in 135 degree of directions of level, and Weaken other direction edge features.
(127) according to formula Svl={ Rv,Gv,Bv}×{Rl,Gl,BlCalculate vertically with 45 degree of both direction feature vectors The product of multiplication, obtains vertical 45 degree of directions notable figure Svl, for retaining and strengthening the edge feature in vertical 45 degree of directions, and weaken Other direction edge features.
(128) according to formula Svr={ Rv,Gv,Bv}×{Rr,Gr,BrCalculate vertically with 135 degree of both direction feature vectors Multiplication product, obtain vertical 135 degree of directions notable figure Svr, for retaining and strengthening the edge feature in vertical 135 degree of directions, and Weaken other direction edge features.
(129) according to formula Slr={ Rl,Gl,Bl}×{Rr,Gr,BrCalculate 45 degree and 135 degree of both direction feature vectors Multiplication product, obtain 45 degree of 135 degree of direction notable figure Slr, for retaining and strengthening the edge feature in 45 degree of 135 degree of directions, and Weaken other direction edge features.
(13) fusion calculation is carried out to ten notable figures of the different directions, obtains video frame feature extraction figure, be follow-up The scene text obtained in video provides visual characteristic, and deletes background and noise jamming, improves scene text and obtains knot automatically The precision and integrity degree of fruit.
In the step (13), the operation content of fusion calculation is carried out to ten notable figures of different directions:Based on step (12) ten notable figures of the different directions of extraction, carry out respectively same coordinate pixel in wherein each image maximum, The correspondence computing of minimum value and average value, and by after operation result superposition, obtain final video frame summary figure fsg.Below Specifically introduce the following operation content of the step (13):
(131) choose each pixel minimum in ten notable figures of the different directions positioned at same coordinate and carry out fusion meter Calculate, form minimal characteristic notable figure Smin(x, y)=min (pi(x, y)), in formula, pi(x, y) be each notable figure coordinate (x, Y) pixel value, subscript i are notable figure classifications, and i ∈ { Shh,Svv,Sll,Srr,Shv,Shl,Shr,Svl,Svr,Slr, function min is Extract pixel piThe oeprator of (x, y) minimum value.
(132) maximum for choosing each pixel in ten notable figures of the different directions positioned at same coordinate is merged Calculate, form maximum characteristic remarkable picture Smax(x, y)=max (pi(x, y)), in formula, function max is extraction pixel pi(x, y) most The oeprator being worth greatly.
(133) average value for choosing each pixel in ten notable figures of the different directions positioned at same coordinate is merged Calculate, form average characteristics notable figure Smean(x, y)=mean (pi(x, y)), in formula, function mean is extraction same position picture Plain piThe oeprator of (x, y) average value.
(134) in order to make video frame summary figure try one's best edge feature integrality of the reserved character in each direction, subtract at the same time The illumination variation easily occurred in few outdoor environment video influences, and can keep setting different directions edge feature using above-mentioned Minimum, maximum and average three kinds of characteristic remarkable pictures, fusion calculation is carried out according to formula:Obtain final video frame feature extraction figure fsg.
Fig. 3 is referred again to, describes the step 2 concrete operations content in above-mentioned two operating procedure in detail:
(21) based on K mean cluster algorithm to the colourity of video frame feature extraction figure, saturation degree, lightness color space HSV (Hue Saturation Value) carries out color cluster calculating:The video frame feature extraction figure is divided into and is represented respectively Background, prospect alphabetic character, the four class K average color cluster results in four regions of character outline and noise.
(22) Connected area disposal$ based on stroke width:To above-mentioned four classes K averages color cluster as a result, calculating respectively each The edge pixel stroke width of connected domain, then each connected domain is analyzed based on stroke width, delete background area and noise region. In the step (22), including following operation content.
(221) based on ten notable figures in step 1, the gradient direction angle of each pixel in video frame summary figure is calculated θ:
(222) because character zone does not appear in video image border or is connected with video image border, therefore delete The connected domain being connected in video frequency abstract figure with the border of surrounding up and down of image;
(223) boundary pixel of each connected domain is obtained, then each boundary pixel is searched forward according to its gradient direction angle θ Pixel value in two boundary pixels, when finding another boundary pixel, is arranged to the stroke of two pixels by rope Width.
(224) stroke width of all boundary pixels of same connected domain is first calculated, then calculates all boundary pixels The variance of stroke width;If the result value for calculating variance is less than 0.5, then it is assumed that the boundary pixel stroke width of the connected domain Value is retained as the character zone of candidate (foundation so operated is because of western language and Chinese character close to actual numerical value Region aspect ratio is more constant setting numerical value, i.e. the stroke width value of western language or Chinese is similar);Otherwise, it is right The bigger region of length and width in connected domain, is considered as being not belonging to character and deleting it.
(23) for the smaller noise region still left after step (22) processing, the connected domain based on geometry is performed Processing:The size of each connected domain, the pixel quantity that i.e. connected domain is included in calculating character image respectively, delete it In be considered as the less connected domain of ratio in noise region, to improve objective image quality.The concrete operations side of Connected area disposal$ Method is to calculate its main axis length, if main axis length is more than 1/3rd of video frame feature extraction figure picture traverse or less than ten / mono-, it is too big or too small to be considered as the connected domain, is not belonging to character zone and deletes it.
(24) scene text region is obtained:All connected domains in four cluster results are analyzed, all clusters are retained Final connected domain merge into an image, the distance and stroke width two according still further to each connected domain are estimated, will similar in Connected domain is determined as the same area, so as to obtain final video scene text.
The method of the present invention have been carried out Multi simulation running implement experiment, experiment the result is that successful.Referring to (A) of Fig. 4, (B), (C), three figure are that the video frame feature that original video frame, step 1 in one embodiment of the method for the present invention obtain is plucked respectively Scheme the operating result with step 2:The example schematic of Outdoor Scene text in the video video of acquisition.It is that is, defeated Enter for the video frame containing scene text, after the processing of the method for the present invention, export the full scene text for acquisition, can For subsequent scenario text identification.Therefore, the present invention can realize goal of the invention well, have before promoting and applying well Scape.

Claims (10)

  1. A kind of 1. Outdoor Scene text automatic obtaining method in video of feature based summary figure, it is characterised in that:This method Including following operative step:
    Step 1, the video frame images of scene text, and the generation of the RGB rgb color space based on the video frame images are obtained Video frame feature extraction figure:First on rgb color space respectively extraction include horizontal direction, vertical direction, 45 degree of directions and Four trellis diagrams in 135 degree of directions, obtain the four direction feature vector for characterizing rgb color space;Again by four sides Carry out two-by-two vectorial product calculation respectively to feature vector, ten of different directions video frame are represented respectively significantly to obtain Figure;Then fusion calculation is carried out to ten notable figures of the different directions, obtains video frame feature extraction figure, obtained as follow-up The visual characteristic of scene text in video, and background and noise jamming are deleted, improve identification precision;
    Step 2, scene text is obtained automatically:It is primarily based on the video frame feature extraction figure and rgb color space carries out K average face Color cluster calculation, which is subdivided into and represents the four of background, prospect alphabetic character, character outline and noise respectively The four class results in a region;Carry out connected domain analysis respectively to the four classes result again, delete two regions of background and noise, obtain Final scene text.
  2. 2. according to the method described in claim 1, it is characterized in that:It is described extracted respectively on rgb color space include water Square into, vertical direction, 45 degree of directions and four trellis diagrams in 135 degree of directions, level side that horizontal direction trellis diagram uses It is the calculation template that horizontal direction differential is calculated in Sobel Sobel operators to convolution kernel:Vertical direction convolution The vertical direction convolution kernel that figure uses is the calculation template that vertical direction differential is calculated in Sobel operators:45 45 degree of direction convolution kernels that degree direction trellis diagram uses are the calculation templates for calculating 45 degree of directional differentials:135 135 degree of direction convolution kernels that degree direction trellis diagram uses are the calculation templates for calculating 135 degree of directional differentials: Trellis diagram feature extracting method feature based on convolution kernel is that algorithm is simple, arithmetic speed is fast, is conducive to Project Realization, and is extracted Convolution feature be not readily susceptible to illumination variation in Outdoor Scene and influence.
  3. 3. according to the method described in claim 1, it is characterized in that:The step 1 includes following operation content:
    (11) horizontal direction trellis diagram R is first extracted respectively on red channel respectivelyh, vertical direction trellis diagram Rv, 45 degree of direction volumes Product figure RlWith 135 degree of direction trellis diagram Rr, extract horizontal direction trellis diagram G respectively on green channelh, vertical direction trellis diagram Gv, 45 degree of direction trellis diagram GlWith 135 degree of direction trellis diagram Gr, extract horizontal direction trellis diagram B respectively on blue channelh, hang down Nogata is to trellis diagram Bv, 45 degree of direction trellis diagram BlWith 135 degree of direction trellis diagram Br;Again by above-mentioned all directions trellis diagram according to Rgb color space arranges, and obtains the four direction feature vector for characterizing rgb color space:Horizontal direction feature vector H= {Rh,Gh,Bh, characteristic vector V={ Rv,Gv,Bv, 45 degree of direction character vector L={ Rl,Gl,Bl, 135 degree of directions Feature vector R={ Rr Gr,Br};
    (12) the four direction feature vector is carried out to product calculation vectorial two-by-two respectively, obtains the not Tongfang for representing video frame To ten notable figures, while multiple direction initialization edge features are retained, to delete the background and noise in remaining direction Interference, and the stroke feature in a variety of directions of scene text is obtained, help to automatically extract scene text;
    (13) fusion calculation is carried out to ten notable figures of the different directions, obtains video frame feature extraction figure, obtained to be follow-up Scene text in video provides visual characteristic, and deletes background and noise jamming, improves scene text and obtains result automatically Precision and integrity degree.
  4. 4. according to the method described in claim 3, it is characterized in that:The step (12) includes following operation content:
    (120) according to formula Shh={ Rh,Gh,Bh}×{Rh,Gh,BhCalculated level direction character vector involution product, obtain Horizontal direction notable figure Shh, for retaining and strengthening the edge feature of horizontal direction, and weaken other direction edge features;
    (121) according to formula Svv={ Rv,Gv,Bv}×{Rv,Gv,BvCalculate characteristic vector involution product, obtain Vertical direction notable figure Svv, for retaining and strengthening the edge feature of vertical direction, and weaken other direction edge features;
    (122) according to formula Sll={ Rl,Gl,Bl}×{Rl,Gl,BlCalculate 45 degree of direction character vectors involution product, obtain 45 degree of direction notable figure Sll, for retaining and strengthening the edge feature in 45 degree of directions, and weaken other direction edge features;
    (123) according to formula Srr={ Rr,Gr,Br}×{Rr,Gr,BrCalculate 135 degree of direction character vectors involution product, obtain 135 degree of direction notable figure Srr, for retaining and strengthening the edge feature in 135 degree of directions, and weaken other direction edge features;
    (124) according to formula Shv={ Rh,Gh,Bh}×{Rv,Gv,BvCalculated level is multiplied with vertical both direction feature vector Product, obtain horizontal vertical direction notable figure Shv, for retaining and strengthening the edge feature of horizontal vertical direction, and weaken other Direction edge feature;
    (125) according to formula Shl={ Rh,Gh,Bh}×{Rl,Gl,BlCalculated level is multiplied with 45 degree of both direction feature vectors Product, obtain 45 degree of direction notable figure S of levelhl, for retaining and strengthening the edge feature in 45 degree of directions of level, and weaken other Direction edge feature;
    (126) according to formula Shr={ Rh,Gh,Bh}×{Rr,Gr,BrCalculated level and the phase of 135 degree of both direction feature vectors The product multiplied, obtains 135 degree of direction notable figure S of levelhr, for retaining and strengthening the edge feature in 135 degree of directions of level, and weaken Other direction edge features;
    (127) according to formula Svl={ Rv,Gv,Bv}×{Rl,Gl,BlCalculate and to be vertically multiplied with 45 degree of both direction feature vectors Product, obtain vertical 45 degree of directions notable figure Svl, for retaining and strengthening the edge feature in vertical 45 degree of directions, and weaken other Direction edge feature;
    (128) according to formula Svr={ Rv,Gv,Bv}×{Rr,Gr,BrCalculate the vertically phase with 135 degree of both direction feature vectors The product multiplied, obtains vertical 135 degree of directions notable figure Svr, for retaining and strengthening the edge feature in vertical 135 degree of directions, and weaken Other direction edge features;
    (129) according to formula Slr={ Rl,Gl,Bl}×{Rr,Gr,BrCalculate 45 degree of phases with 135 degree of both direction feature vectors The product multiplied, obtains 45 degree of 135 degree of direction notable figure Slr, for retaining and strengthening the edge feature in 45 degree of 135 degree of directions, and weaken Other direction edge features.
  5. 5. according to the method described in claim 3, it is characterized in that:It is notable to ten of different directions in the step (13) Figure carries out the operation content of fusion calculation:Ten notable figures of the different directions based on step (12) extraction, carry out wherein respectively Each correspondence computing of the maximum of the same coordinate pixel in image, minimum value and average value, and the operation result is superimposed Afterwards, final video frame summary figure fsg is obtained.
  6. 6. method according to claim 4 or 5, it is characterised in that:The step (13) includes following operation content:
    (131) choose each pixel minimum in ten notable figures of the different directions positioned at same coordinate and carry out fusion calculation, Form minimal characteristic notable figure Smin(x, y)=min (pi(x, y)), in formula, pi(x, y) be each notable figure coordinate (x, y) as Element value, subscript i is notable figure classification, and i ∈ { Shh,Svv,Sll,Srr,Shv,Shl,Shr,Svl,Svr,Slr, function min is extraction Pixel piThe oeprator of (x, y) minimum value:
    (132) maximum for choosing each pixel in ten notable figures of the different directions positioned at same coordinate carries out fusion meter Calculate, form maximum characteristic remarkable picture Smax(x, y)=max (pi(x, y)), in formula, function max is extraction pixel pi(x, y) is maximum The oeprator of value:
    (133) average value for choosing each pixel in ten notable figures of the different directions positioned at same coordinate carries out fusion meter Calculate, form average characteristics notable figure Smean(x, y)=mean (pi(x, y)), in formula, function mean is extraction same position pixel piThe oeprator of (x, y) average value:
    (134) in order to make video frame summary figure try one's best edge feature integrality of the reserved character in each direction, while family is reduced The illumination variation easily occurred in external environment video influences, and setting different directions edge feature can be kept most using above-mentioned Small, maximum and average three kinds of characteristic remarkable pictures, fusion calculation is carried out according to formula:Obtain final video frame feature extraction figure fsg.
  7. 7. according to the method described in claim 1, it is characterized in that:The step (2) includes following operation content:
    (21) based on K mean cluster algorithm to the colourity of video frame feature extraction figure, saturation degree, lightness color space HSV (Hue Saturation Value) carries out color cluster calculating:The video frame feature extraction figure is divided into and represents the back of the body respectively Scape, prospect alphabetic character, the four class K average color cluster results in four regions of character outline and noise;
    (22) Connected area disposal$ based on stroke width:To above-mentioned four classes K averages color cluster as a result, calculating each connection respectively The edge pixel stroke width in domain, then each connected domain is analyzed based on stroke width, delete background area and noise region;
    (23) for the smaller noise region still left after step (22) processing, perform at the connected domain based on geometry Reason:The pixel quantity that each connected domain is included in calculating character image respectively, deletes the ratio for being wherein considered as noise region Less connected domain, to improve objective image quality;
    (24) scene text region is obtained:All connected domains in four cluster results are analyzed, all clusters are retained most Whole connected domain merges into an image, and the distance and stroke width two according still further to each connected domain are estimated, by similar connection Domain is determined as the same area, so as to obtain final video scene text.
  8. 8. according to the method described in claim 6, it is characterized in that:Color cluster in the K mean cluster algorithm calculates In the space-time of video frequency abstract figure, colourity, saturation degree and lightness, respectively according to each pixel and four cluster centre points Included angle cosine distance carry out what four-dimensional cluster was realized;Because of the illumination variation in environment outdoors so that each alphabetic character exists Different colours are presented in video frame, seriously affect the stroke integrality of extraction;Therefore clustered using the above-mentioned four-dimension, and adopt Calculated with the included angle cosine distance function for not focusing on numerical value difference itself, the illumination variation pair because of outdoor environment can be significantly reduced The influence of character color.
  9. 9. according to the method described in claim 6, it is characterized in that:In the step (22), including following operation content;
    (221) based on ten notable figures in step 1, the gradient direction angle θ of each pixel in video frame summary figure is calculated:
    (222) because character zone does not appear in video image border or is connected with video image border, therefore video is deleted The connected domain being connected in summary figure with the border of surrounding up and down of image;
    (223) boundary pixel of each connected domain is obtained, then each boundary pixel is searched for forward according to its gradient direction angle θ, When finding another boundary pixel, the pixel value in two boundary pixels is arranged to the stroke width of two pixels Degree;
    (224) stroke width of all boundary pixels of same connected domain is first calculated, then calculates the stroke of all boundary pixels The variance of width;If the numerical value for calculating variance is less than 0.5, then it is assumed that the boundary pixel stroke width angle value of the connected domain is close in fact Border numerical value, and it is retained as the character zone of candidate;Otherwise it is assumed that it is not belonging to character and deletes it.
  10. 10. according to the method described in claim 8, it is characterized in that:Because the region aspect ratio of western language and Chinese character is setting Numerical value, i.e. the stroke width value of western language or Chinese are similar, therefore delete the bigger region of the length and width in connected domain;Institute It is to seek its main axis length to state connected domain, if main axis length is more than 1/3rd or small of video frame feature extraction figure picture traverse In 1/10th, it is considered as the connected domain and is not belonging to character zone and deletes it.
CN201711381971.5A 2017-12-20 2017-12-20 Method for automatically acquiring outdoor scene text in video based on characteristic abstract diagram Active CN108038458B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711381971.5A CN108038458B (en) 2017-12-20 2017-12-20 Method for automatically acquiring outdoor scene text in video based on characteristic abstract diagram

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711381971.5A CN108038458B (en) 2017-12-20 2017-12-20 Method for automatically acquiring outdoor scene text in video based on characteristic abstract diagram

Publications (2)

Publication Number Publication Date
CN108038458A true CN108038458A (en) 2018-05-15
CN108038458B CN108038458B (en) 2021-04-09

Family

ID=62099983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711381971.5A Active CN108038458B (en) 2017-12-20 2017-12-20 Method for automatically acquiring outdoor scene text in video based on characteristic abstract diagram

Country Status (1)

Country Link
CN (1) CN108038458B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829458A (en) * 2019-01-14 2019-05-31 上海交通大学 The method of the journal file of record system operatio behavior is automatically generated in real time
CN110347870A (en) * 2019-06-19 2019-10-18 西安理工大学 The video frequency abstract generation method of view-based access control model conspicuousness detection and hierarchical clustering method
CN110472550A (en) * 2019-08-02 2019-11-19 南通使爱智能科技有限公司 A kind of text image shooting integrity degree judgment method and system
CN113192033A (en) * 2021-04-30 2021-07-30 深圳市创想三维科技有限公司 Wire drawing distinguishing method, device, equipment and storage medium in 3D printing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276461A (en) * 2008-03-07 2008-10-01 北京航空航天大学 Method for increasing video text with edge characteristic
CN101515325A (en) * 2009-04-08 2009-08-26 北京邮电大学 Character extracting method in digital video based on character segmentation and color cluster
CN104751153A (en) * 2013-12-31 2015-07-01 中国科学院深圳先进技术研究院 Scene text recognizing method and device
WO2017089865A1 (en) * 2015-11-24 2017-06-01 Czech Technical University In Prague, Department Of Cybernetics Efficient unconstrained stroke detector
CN106874905A (en) * 2017-01-12 2017-06-20 中南大学 A kind of method of the natural scene text detection based on self study Color-based clustering
CN107066972A (en) * 2017-04-17 2017-08-18 武汉理工大学 Natural scene Method for text detection based on multichannel extremal region

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276461A (en) * 2008-03-07 2008-10-01 北京航空航天大学 Method for increasing video text with edge characteristic
CN101515325A (en) * 2009-04-08 2009-08-26 北京邮电大学 Character extracting method in digital video based on character segmentation and color cluster
CN104751153A (en) * 2013-12-31 2015-07-01 中国科学院深圳先进技术研究院 Scene text recognizing method and device
WO2017089865A1 (en) * 2015-11-24 2017-06-01 Czech Technical University In Prague, Department Of Cybernetics Efficient unconstrained stroke detector
CN106874905A (en) * 2017-01-12 2017-06-20 中南大学 A kind of method of the natural scene text detection based on self study Color-based clustering
CN107066972A (en) * 2017-04-17 2017-08-18 武汉理工大学 Natural scene Method for text detection based on multichannel extremal region

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIAO DONG HUANG ET AL.: "Video Text Detection Based on Text Edge Map", 《2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT)》 *
XIAODONG HUANG ET AL.: "Video Text Extraction Based on Stroke Width and Color", 《PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13)》 *
XIAODONG HUANG: "Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform", 《MULTIMEDIA TOOLS AND APPLICATIONS 》 *
黄晓冬: "基于特征融合的视频文本获取研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829458A (en) * 2019-01-14 2019-05-31 上海交通大学 The method of the journal file of record system operatio behavior is automatically generated in real time
CN109829458B (en) * 2019-01-14 2023-04-04 上海交通大学 Method for automatically generating log file for recording system operation behavior in real time
CN110347870A (en) * 2019-06-19 2019-10-18 西安理工大学 The video frequency abstract generation method of view-based access control model conspicuousness detection and hierarchical clustering method
CN110472550A (en) * 2019-08-02 2019-11-19 南通使爱智能科技有限公司 A kind of text image shooting integrity degree judgment method and system
CN113192033A (en) * 2021-04-30 2021-07-30 深圳市创想三维科技有限公司 Wire drawing distinguishing method, device, equipment and storage medium in 3D printing
CN113192033B (en) * 2021-04-30 2024-03-19 深圳市创想三维科技股份有限公司 Wire drawing judging method, device and equipment in 3D printing and storage medium

Also Published As

Publication number Publication date
CN108038458B (en) 2021-04-09

Similar Documents

Publication Publication Date Title
Yan et al. A fast uyghur text detector for complex background images
Gopalakrishnan et al. Salient region detection by modeling distributions of color and orientation
Jiang et al. Automatic salient object segmentation based on context and shape prior.
Maas et al. Using pattern recognition to automatically localize reflection hyperbolas in data from ground penetrating radar
CN106250895B (en) A kind of remote sensing image region of interest area detecting method
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
Wang et al. Background-driven salient object detection
CN108038458A (en) Outdoor Scene text automatic obtaining method in the video of feature based summary figure
US20100008576A1 (en) System and method for segmentation of an image into tuned multi-scaled regions
AU2014277853A1 (en) Object re-identification using self-dissimilarity
Wang et al. Airport detection in remote sensing images based on visual attention
Hu et al. Clothing segmentation using foreground and background estimation based on the constrained Delaunay triangulation
CN107688806A (en) A kind of free scene Method for text detection based on affine transformation
CN106874942B (en) Regular expression semantic-based target model rapid construction method
CN110728302A (en) Method for identifying color textile fabric tissue based on HSV (hue, saturation, value) and Lab (Lab) color spaces
CN104123554A (en) SIFT image characteristic extraction method based on MMTD
CN108537816A (en) A kind of obvious object dividing method connecting priori with background based on super-pixel
Ming et al. A blob detector in color images
CN108256518A (en) Detection method and detection device for character region
Hati et al. An image texture insensitive method for saliency detection
Hu et al. Fast face detection based on skin color segmentation using single chrominance Cr
Gui et al. A fast caption detection method for low quality video images
Wang et al. Character segmentation of color images from digital camera
Si-ming et al. Moving shadow detection based on Susan algorithm
Hong et al. A real-time critical part detection for the blurred image of infrared reconnaissance balloon with boundary curvature feature analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant