CN111899202A - Method for enhancing superimposed time characters in video image - Google Patents

Method for enhancing superimposed time characters in video image Download PDF

Info

Publication number
CN111899202A
CN111899202A CN202010422327.3A CN202010422327A CN111899202A CN 111899202 A CN111899202 A CN 111899202A CN 202010422327 A CN202010422327 A CN 202010422327A CN 111899202 A CN111899202 A CN 111899202A
Authority
CN
China
Prior art keywords
image
time
character
characters
background
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010422327.3A
Other languages
Chinese (zh)
Other versions
CN111899202B (en
Inventor
聂晖
杨小波
李军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Eastwit Technology Co ltd
Original Assignee
Wuhan Eastwit Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Eastwit Technology Co ltd filed Critical Wuhan Eastwit Technology Co ltd
Priority to CN202010422327.3A priority Critical patent/CN111899202B/en
Publication of CN111899202A publication Critical patent/CN111899202A/en
Application granted granted Critical
Publication of CN111899202B publication Critical patent/CN111899202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration by the use of local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer vision, and particularly relates to a method for identifying and enhancing time annotation information of a video image. The invention comprises the following steps: training UNet (an image segmentation neural network) to realize a pixel-level time character extraction model in an image; and performing graying suppression on the original image background on the image to be detected by means of a time character extraction model so as to enhance the identifiability of the identified time characters. The invention aims at the character characteristics in the natural scene monitoring image, realizes a time marking information enhancement method, and overcomes the difficult problem to be solved when identifying the 'substrate-free' superposition time characters of the video image. The invention focuses on the separation-inhibition processing of the superposed characters and the image background, and is an image enhancement technology with great application value in the field of scene character recognition.

Description

Method for enhancing superimposed time characters in video image
Technical Field
The invention belongs to the field of computer vision, and is suitable for detecting the overlapping time characters in pictures of video monitoring systems in public security and related industries. In particular to a method for identifying and enhancing time marking information of video images.
Background
With the development of social security management, the identification of time marking information in massive video monitoring images has obvious and special application value for the technical investigation work of the public security industry, and is also one of the assessment contents of the public security department on the operation and maintenance work of national video image networking application platforms.
According to the implementation requirements of GA/T751-2008 video image text annotation specification, the time characters superposed in the natural scene image cannot use the 'substrate' picture block to cover the background. The characters are overlaid on the random scene image monitored outdoors, the background between the stroke gaps of the single characters and the adjacent character intervals is kept visible, the interference of random illumination distribution, background trivial objects and the like is easily caused, and the time character recognition is difficult.
Disclosure of Invention
The invention aims to solve the technical problem of providing a character enhancement technical scheme aiming at 'substrate-free' superposed time characters in a natural scene image, and overcoming the difficulty of identifying video image time annotation information in the prior art.
In order to solve the technical problem, the basic technical idea of the invention is to train UNet (an image segmentation neural network) to realize a pixel-level time character extraction model in an image; and performing graying suppression on the original image background on the image to be detected by means of a time character extraction model so as to enhance the identifiability of the identified time characters.
Therefore, the invention provides an enhancement method for superimposing time characters in a video image, which comprises the following steps:
step i, generating UNet training samples in a customized batch mode;
step ii, using an extraction model of time character pixels in the UNet training image;
and iii, suppressing the background of the image to be detected based on the mask obtained by the time character extraction model.
Preferably, the step i, the step of customizing the UNet batch training samples, includes:
1-1) taking batch random video images as a background, drawing time characters with black and white colors and various fonts, and superposing 'no substrate' on the time characters to be used as training input samples;
1-2) taking a black image with the same size as a background, and superposing time characters with white color and the same other contents and characteristics on the same coordinate position as an input sample to serve as one-to-one corresponding extraction target sample;
preferably, the step ii of training the extraction model of the temporal character pixel in the image by using UNet specifically includes:
2-1) setting feature extraction convolution network structure
M sets of 'convolution + pooled downsampling', where each set is convolved with N layers and contains BatchNormal and ReLU operations;
adjusting the number of channels by using a layer of single-layer convolution after the M groups of pooling so as to match with subsequent up-sampling;
k groups of 'upsampling + convolution', where each group of convolutions has L layers;
the output matrix of each layer of up-sampling is connected to the output matrix of the corresponding down-sampling convolution layer in sequence;
after K sets of upsampling, the number of channels is reduced to 1 by using a layer of single-layer convolution for outputting final characteristics.
2-2) defining training parameters and outputting a segmentation model
Convolutional layer configuration, outputting channel number maps, convolutional kernel size k, stride is s, padding is p;
pooling and upsampling configuration, the size of a sliding Window is Window, stride is s, padding is p;
the length and width of the original input image must be integral multiples of a;
using a sigmoid activation function for the finally output prediction matrix, adjusting the value of the prediction matrix to be in the range of 0 to 1, and when a certain characteristic value is greater than a threshold value S, indicating that the position is a character pixel;
preferably, in step iii, the specific step of suppressing the background of the image to be detected based on the mask obtained by the time character extraction model includes:
3-1) segmenting time characters and image backgrounds in the image to be detected by using a trained UNet time character extraction model to obtain a time character instance segmentation mask;
3-2) performing graying treatment on a background area of non-time character strokes (pixels) in the original image to be detected by a linear attenuation algorithm according to the mask.
Therefore, training of the time character extraction model is completed, the time characters and the background area in the image to be detected can be subjected to example segmentation, the time character identification degree is enhanced, and the technical purpose of the invention is achieved.
The beneficial effects of the invention include:
1) the method aims at the character characteristics in the natural scene monitoring image, realizes the time marking information enhancement method, and overcomes the difficult problem to be solved when identifying the 'substrate-free' superposition time characters of the video image.
2) The invention focuses on the separation-inhibition processing of the superposed characters and the image background, and is an image enhancement technology with great application value in the field of scene character recognition.
Drawings
The technical solution of the present invention will be further specifically described with reference to the accompanying drawings and the detailed description.
FIG. 1 is a basic flow diagram of the process of the present invention;
FIG. 2 is an example diagram of training input samples and extracted target samples of the UNet network extraction model;
FIG. 3 is an example of an original image of a time character in an original image to be detected by a UNet network extraction model;
FIG. 4 is an example of a mask diagram of a UNet network extraction model for time characters in an original image to be detected;
fig. 5 is an example of a mask-based background suppression map of the UNet network extraction model for the time characters in the original to-be-detected image.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, the present invention provides an overall flowchart of an enhancement method for superimposing time characters in a video image, which mainly comprises the following steps:
step i, customizing to generate UNet batch training samples;
step ii, using an extraction model of time character pixels in the UNet training image;
and iii, suppressing the background of the image to be detected based on the mask obtained by the time character extraction model.
Step i comprises the following subdivision steps:
1-2) taking batch random video images as a background, drawing time characters with black and white colors and various fonts, and superposing 'no substrate' on the time characters to be used as training input samples;
Ωi={0,1,...,K}
drawing K characters, where ΩiRepresents the pixel range to be drawn by the ith character (only including the stroke pixels of the character), omega0Representing background pixels other than character pixels.
Figure BDA0002497386290000041
Figure BDA0002497386290000042
I[x,y]Representing a pixel [ x, y ] in an image]RGB average luminance value of [ omega ]iRepresents the number of background pixels covered by the ith character, and d (i) represents the average brightness of the background pixels covered by the ith character.
The rendering function for character generation is as follows:
Figure BDA0002497386290000043
using RGBA encoding, [0,0,0,0] represents a colorless transparent pixel, [0,0,0,1] represents a black pixel, and [1,1,1,1] represents a white pixel.
f'=(1.0-α)ob+f
And f and a background image b are mixed, wherein alpha represents an image formed by transparency channels in f, and o in the formula represents a matrix operator, and the matrix operator represents that two matrixes with the same size are subjected to element-by-element multiplication.
1-2) taking a black image with the same size as a background, and superposing time characters with white color and the same other contents and characteristics on the same coordinate position as an input sample to serve as one-to-one corresponding extraction target sample;
t=α
and alpha is the white character image with the median value of 1 in the transparency channel in the previous operation, and is directly used as the extraction target sample t.
As shown in fig. 2, the upper and lower portions represent the input sample and the target sample, respectively.
Step ii comprises the following subdivision steps:
2-1) setting feature extraction convolution network structure
M sets of 'convolution + pooled downsampling', where each set is convolved with N layers and contains BatchNormal and ReLU operations;
in this example, M is 4 and N is 2.
Adjusting the number of channels by using a layer of single-layer convolution after the M groups of pooling so as to match with subsequent up-sampling;
k groups of 'upsampling + convolution', where each group of convolutions has L layers;
in this example, K is 4 and L is 2.
The output matrix of each layer of up-sampling is connected to the output matrix of the corresponding down-sampling convolution layer in sequence;
after K sets of upsampling, the number of channels is reduced to 1 by using a layer of single-layer convolution for outputting final characteristics.
2-2) defining training parameters and outputting a segmentation model
Convolutional layer configuration, outputting channel number maps, convolutional kernel size k, stride is s, padding is p;
in this embodiment, the maps start with 64 during downsampling, and end with 64 during upsampling; k. s and p take the values of 3 x 3, 1 and 1 respectively.
Pooling and upsampling configuration, the size of a sliding Window is Window, stride is s, padding is p;
in this embodiment, Window, s, and p take values of 2 × 2, and 0, respectively.
The length and width of the original input image must be integral multiples of a;
in this example, a is 16.
Using a sigmoid activation function for the finally output prediction matrix, adjusting the value of the prediction matrix to be in the range of 0 to 1, and when a certain characteristic value is greater than a threshold value S, indicating that the position is a character pixel;
Figure BDA0002497386290000061
in this embodiment, the threshold S is 0.5, and x is a prediction matrix output by the UNet network, and its value is within a real number range.
Step iii comprises the following subdivision steps:
3-1) segmenting time characters and image backgrounds in the image to be detected by using a trained UNet time character extraction model to obtain a time character instance segmentation mask;
let n be the original image and m be the resulting example split mask.
If m [ i, j ] ═ 1 (white) indicates that n [ i, j ] is a character pixel, and conversely, it is a background pixel.
3-2) performing graying treatment on a background area of non-time character strokes (pixels) in the original image to be detected by using a linear attenuation algorithm according to the mask shown in the figure 4;
Figure BDA0002497386290000062
wherein n' represents the image after the background attenuation, and o represents a matrix operator in the formula, which represents that two matrixes with the same size are subjected to element-by-element multiplication.
The attenuation control coefficient k > is 0, and the larger k is, the more obvious the attenuation effect is;
in this example, k is 8.
Finally, the original image is suppressed based on the time character mask, resulting in the output shown in FIG. 5.
Therefore, training of the time character extraction model is completed, the time characters and the background area in the image to be detected can be subjected to example segmentation, the time character identification degree is enhanced, and the technical scheme of the invention is realized.
It will be clear to those skilled in the art that the specific values of the above parameters or thresholds can be adjusted according to the strictness of the sample training method and the specification implementation, and do not limit the present invention.
Finally, it should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and not intended to limit the present invention, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications and equivalents can be made in the technical solutions described in the foregoing embodiments, or some technical features of the present invention may be substituted. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. An enhancement method for superimposing a temporal character in a video image, comprising the steps of:
step i, generating UNet training samples in a customized batch mode;
step ii, using an extraction model of time character pixels in the UNet training image;
and iii, suppressing the background of the image to be detected based on the mask obtained by the time character extraction model.
2. The method according to claim 1, wherein the step i, the step of customizing the UNet batch training samples, comprises:
1-1) taking batch random video images as a background, drawing time characters with black and white colors and various fonts, and superposing 'no substrate' on the time characters to be used as training input samples;
1-2) taking a black image with the same size as a background, and superposing time characters with white color and the same other contents and characteristics on the same coordinate position as the input sample to be used as one-to-one corresponding extraction target sample.
3. An enhancement method for superimposing temporal characters in video images according to claim 1, wherein the step ii of using UNet to train the extraction model of temporal character pixels in images comprises:
2-1) setting feature extraction convolution network structure
M sets of 'convolution + pooled downsampling', where each set is convolved with N layers and contains BatchNormal and ReLU operations;
adjusting the number of channels by using a layer of single-layer convolution after the M groups of pooling so as to match with subsequent up-sampling;
k groups of 'upsampling + convolution', where each group of convolutions has L layers;
the output matrix of each layer of up-sampling is connected to the output matrix of the corresponding down-sampling convolution layer in sequence;
after the K groups of up-sampling, reducing the number of channels to 1 by using a layer of single-layer convolution for outputting final characteristics;
2-2) defining training parameters and outputting a segmentation model
Convolutional layer configuration, outputting channel number maps, convolutional kernel size k, stride is s, padding is p;
pooling and upsampling configuration, the size of a sliding Window is Window, stride is s, padding is p;
the length and width of the original input image must be integral multiples of a;
and (3) using a sigmoid activation function for the finally output prediction matrix, adjusting the value of the prediction matrix to be in the range of 0 to 1, and when a certain characteristic value is greater than a threshold value S, indicating that the position is a character pixel.
4. The method for enhancing temporal character superposition in video images according to claim 1, wherein in the step iii, the specific step of suppressing the background of the image to be detected based on the mask obtained by the temporal character extraction model comprises:
3-1) segmenting time characters and image backgrounds in the image to be detected by using a trained UNet time character extraction model to obtain a time character instance segmentation mask;
3-2) performing graying treatment on a background area of non-time character strokes (pixels) in the original image to be detected by a linear attenuation algorithm according to the mask.
CN202010422327.3A 2020-05-19 2020-05-19 Enhancement method for superimposed time character in video image Active CN111899202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010422327.3A CN111899202B (en) 2020-05-19 2020-05-19 Enhancement method for superimposed time character in video image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010422327.3A CN111899202B (en) 2020-05-19 2020-05-19 Enhancement method for superimposed time character in video image

Publications (2)

Publication Number Publication Date
CN111899202A true CN111899202A (en) 2020-11-06
CN111899202B CN111899202B (en) 2024-03-15

Family

ID=73207449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010422327.3A Active CN111899202B (en) 2020-05-19 2020-05-19 Enhancement method for superimposed time character in video image

Country Status (1)

Country Link
CN (1) CN111899202B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947529A (en) * 2021-10-14 2022-01-18 万翼科技有限公司 Image enhancement method, model training method, component identification method and related equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6366699B1 (en) * 1997-12-04 2002-04-02 Nippon Telegraph And Telephone Corporation Scheme for extractions and recognitions of telop characters from video data
JP2005234786A (en) * 2004-02-18 2005-09-02 Nippon Telegr & Teleph Corp <Ntt> Video keyword extraction method, device and program
CN101546427A (en) * 2008-02-27 2009-09-30 西门子电脑辅助诊断有限公司 Method of suppressing obscuring features in an image
CN107292854A (en) * 2017-08-02 2017-10-24 大连海事大学 Grayscale image enhancement method based on local singularity quantitative analysis
CN108805042A (en) * 2018-05-25 2018-11-13 武汉东智科技股份有限公司 The detection method that road area monitor video is blocked by leaf
CN109948510A (en) * 2019-03-14 2019-06-28 北京易道博识科技有限公司 A kind of file and picture example dividing method and device
CN110659574A (en) * 2019-08-22 2020-01-07 北京易道博识科技有限公司 Method and system for outputting text line contents after status recognition of document image check box
CN111079745A (en) * 2019-12-11 2020-04-28 中国建设银行股份有限公司 Formula identification method, device, equipment and storage medium
CN111126396A (en) * 2019-12-25 2020-05-08 北京科技大学 Image recognition method and device, computer equipment and storage medium
CN111160335A (en) * 2020-01-02 2020-05-15 腾讯科技(深圳)有限公司 Image watermarking processing method and device based on artificial intelligence and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6366699B1 (en) * 1997-12-04 2002-04-02 Nippon Telegraph And Telephone Corporation Scheme for extractions and recognitions of telop characters from video data
JP2005234786A (en) * 2004-02-18 2005-09-02 Nippon Telegr & Teleph Corp <Ntt> Video keyword extraction method, device and program
CN101546427A (en) * 2008-02-27 2009-09-30 西门子电脑辅助诊断有限公司 Method of suppressing obscuring features in an image
CN107292854A (en) * 2017-08-02 2017-10-24 大连海事大学 Grayscale image enhancement method based on local singularity quantitative analysis
CN108805042A (en) * 2018-05-25 2018-11-13 武汉东智科技股份有限公司 The detection method that road area monitor video is blocked by leaf
CN109948510A (en) * 2019-03-14 2019-06-28 北京易道博识科技有限公司 A kind of file and picture example dividing method and device
CN110659574A (en) * 2019-08-22 2020-01-07 北京易道博识科技有限公司 Method and system for outputting text line contents after status recognition of document image check box
CN111079745A (en) * 2019-12-11 2020-04-28 中国建设银行股份有限公司 Formula identification method, device, equipment and storage medium
CN111126396A (en) * 2019-12-25 2020-05-08 北京科技大学 Image recognition method and device, computer equipment and storage medium
CN111160335A (en) * 2020-01-02 2020-05-15 腾讯科技(深圳)有限公司 Image watermarking processing method and device based on artificial intelligence and electronic equipment

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HAOJIN YANG等: "Content Based Lecture Video Retrieval Using Speech and Video Text Information", 《IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES》, vol. 7, no. 2, pages 142 - 154, XP011552591, DOI: 10.1109/TLT.2014.2307305 *
PEI YIN等: "Automatic time stamp extraction system for home videos", 《2002 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)》, pages 1 - 4 *
刘洋: "安防监控场景下时间戳文字识别", 《中国优秀硕士学位论文全文数据库 信息科技辑》, pages 138 - 350 *
周东傲;林嘉宇;: "视频图像文字检测综述", 计算机工程与科学, no. 04, pages 760 - 764 *
鲍复民, 李爱国, 覃征: "彩色照片时间戳识别", 复旦学报(自然科学版), no. 05, pages 914 - 917 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947529A (en) * 2021-10-14 2022-01-18 万翼科技有限公司 Image enhancement method, model training method, component identification method and related equipment

Also Published As

Publication number Publication date
CN111899202B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
US10614574B2 (en) Generating image segmentation data using a multi-branch neural network
CN105719247B (en) Single image to the fog method based on feature learning
CN111242837B (en) Face anonymity privacy protection method based on generation countermeasure network
CN104537615A (en) Local Retinex enhancement algorithm based on HSV color spaces
CN104766071B (en) A kind of traffic lights fast algorithm of detecting applied to pilotless automobile
CN111898606B (en) Night imaging identification method for superimposing transparent time characters in video image
CN109815948B (en) Test paper segmentation algorithm under complex scene
CN113240679A (en) Image processing method, image processing device, computer equipment and storage medium
CN105118027A (en) Image defogging method
Sihotang Implementation of Gray Level Transformation Method for Sharping 2D Images
CN106933579A (en) Image rapid defogging method based on CPU+FPGA
CN113468996A (en) Camouflage object detection method based on edge refinement
CN107563476B (en) Two-dimensional code beautifying and anti-counterfeiting method
CN116152173A (en) Image tampering detection positioning method and device
CN111899202B (en) Enhancement method for superimposed time character in video image
CN110880164B (en) Image processing method, device, equipment and computer storage medium
CN113012068A (en) Image denoising method and device, electronic equipment and computer readable storage medium
CN108764287A (en) Object detection method and system based on deep learning and grouping convolution
CN117036216A (en) Data generation method and device, electronic equipment and storage medium
CN111815733A (en) Video coloring method and system
CN114519788A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN102930542A (en) Detection method for vector saliency based on global contrast
CN116309233A (en) Infrared and visible light image fusion method based on night vision enhancement
CN115984133A (en) Image enhancement method, vehicle snapshot method, device and medium
CN111402223B (en) Transformer substation defect problem detection method using transformer substation video image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant