CN111899202B - Enhancement method for superimposed time character in video image - Google Patents

Enhancement method for superimposed time character in video image Download PDF

Info

Publication number
CN111899202B
CN111899202B CN202010422327.3A CN202010422327A CN111899202B CN 111899202 B CN111899202 B CN 111899202B CN 202010422327 A CN202010422327 A CN 202010422327A CN 111899202 B CN111899202 B CN 111899202B
Authority
CN
China
Prior art keywords
image
time
character
background
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010422327.3A
Other languages
Chinese (zh)
Other versions
CN111899202A (en
Inventor
聂晖
杨小波
李军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Eastwit Technology Co ltd
Original Assignee
Wuhan Eastwit Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Eastwit Technology Co ltd filed Critical Wuhan Eastwit Technology Co ltd
Priority to CN202010422327.3A priority Critical patent/CN111899202B/en
Publication of CN111899202A publication Critical patent/CN111899202A/en
Application granted granted Critical
Publication of CN111899202B publication Critical patent/CN111899202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration by the use of local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer vision, and particularly relates to a method for identifying and enhancing time annotation information of a video image. The invention comprises the following steps: training UNet (an image segmentation neural network) to realize a pixel-level time character extraction model in an image; and graying and suppressing the background of the original image on the image to be detected by means of the time character extraction model so as to enhance the recognizability of the recognized time character. Aiming at character features in a natural scene monitoring image, the invention realizes a method for enhancing time annotation information, and solves the problem to be solved when identifying the overlapping time characters of the 'no substrate' of the video image. The invention focuses on 'separation-inhibition' processing of superimposed characters and image background, and is an image enhancement technology with great application value in the field of scene character recognition.

Description

Enhancement method for superimposed time character in video image
Technical Field
The invention belongs to the field of computer vision, and is suitable for detecting superimposed time characters in pictures of video monitoring systems in public security and related industries. In particular to a method for identifying and enhancing time annotation information of video images.
Background
Along with the development of social security management, the identification of time annotation information in a large number of video monitoring images has remarkable and special application value for technical detection work in the public security industry, and is also one of examination contents of public security department on operation and maintenance work of national video image networking application platforms.
According to the implementation requirements of GA/T751-2008 video image text label specification, the time characters superimposed in the natural scene image cannot mask the background by using the 'substrate' image blocks. It is easy to understand that the characters are superimposed on the random scene image monitored outdoors, the background is visible between the strokes and the gaps of the single character and between the adjacent character intervals, the random distribution of illumination and the interference of background trivial objects are extremely easy, and great difficulty is brought to the recognition of time characters.
Disclosure of Invention
The invention aims to solve the technical problem of providing a character enhancement technical scheme aiming at the time characters overlapped by 'no substrate' in a natural scene image, and solves the problem of identification of video image time annotation information in the prior art.
In order to solve the technical problems, the basic technical concept of the invention is that training UNet (an image segmentation neural network) realizes a pixel-level time character extraction model in an image; and graying and suppressing the background of the original image on the image to be detected by means of the time character extraction model so as to enhance the recognizability of the recognized time character.
Therefore, the invention provides an enhancement method for overlapping time characters in a video image, which comprises the following steps:
step i, customizing batch generation of UNet training samples;
step ii, using an extraction model of time character pixels in the UNet training image;
and iii, suppressing the background of the image to be detected based on the mask obtained by the time character extraction model.
Preferably, the step i, the specific step of customizing the UNet batch training samples includes:
1-1) taking a batch of random video images as a background, drawing time characters of black and white colors and multiple fonts, and overlapping a substrate-free image as a training input sample;
1-2) taking a black image with the same size as a background, and superposing time characters with white color and other content and characteristics on the same coordinate position as an input sample to serve as extraction target samples in one-to-one correspondence;
preferably, the specific step of extracting the model of the time character pixels in the UNet training image in the step ii includes:
2-1) set feature extraction convolutional network structure
M sets of 'convolutions+pooled downsampling', where each set of convolutions has N layers and contains Batchnormal and ReLU operations;
after M groups are pooled, a layer of single-layer convolution is used for adjusting the channel number so as to match with the subsequent up-sampling;
k sets of 'upsampling+convolution', where each set of convolutions has L layers;
the up-sampled output matrix of each layer is sequentially connected to the output matrix of the corresponding down-sampled convolution layer;
after up-sampling of the K groups, a single layer convolution is used to reduce the number of channels to 1 for outputting the final features.
2-2) defining training parameters and outputting a segmentation model
The configuration of a convolution layer, namely outputting the number maps of channels, wherein the size k of a convolution kernel is s, and the packing is p;
pooling and up-sampling configuration, wherein the sliding Window size Window is s, and packing is p;
the length and width of the original input image must be integral multiples of a;
a sigmoid activation function is used for the finally output prediction matrix, the value of the finally output prediction matrix is adjusted to be in the range of 0 to 1, and when a certain characteristic value is greater than a threshold value S, the position is indicated to be a character pixel;
preferably, the step iii, the specific step of suppressing the background of the image to be detected based on the mask obtained by the time character extraction model includes:
3-1) using a trained UNet time character extraction model to segment time characters and image backgrounds in an image to be detected, and obtaining a time character instance segmentation mask;
3-2) according to the mask, carrying out graying treatment on the background area of the non-time character strokes (pixels) in the original to-be-detected image by using a linear attenuation algorithm.
The training of the time character extraction model is completed, the time characters and the background area in the image to be detected can be subjected to example segmentation, the time character recognition degree is enhanced, and the technical purpose of the invention is achieved.
The beneficial effects of the invention include:
1) Aiming at character features in a natural scene monitoring image, the method realizes a time annotation information enhancement method, and solves the problem to be solved when the overlapping time characters of the 'no substrate' of the video image are identified.
2) The invention focuses on 'separation-inhibition' processing of superimposed characters and image background, and is an image enhancement technology with great application value in the field of scene character recognition.
Drawings
The technical scheme of the invention is further specifically described below with reference to the accompanying drawings and the detailed description.
FIG. 1 is a basic flow chart of the method of the present invention;
FIG. 2 is a diagram of training input samples and extracted target samples for a UNet network extraction model;
fig. 3 is an example of an original image of a UNet network extraction model for time characters in the original image to be inspected;
FIG. 4 is an example of a mask of a UNet network extraction model to time characters in an original image to be inspected;
fig. 5 is an example of a mask-based background suppression map of a UNet network extraction model to time characters in an original image to be inspected.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, the present invention proposes an overall flowchart of an enhancement method for overlapping time characters in a video image, which mainly includes the following steps:
step i, customizing to generate UNet batch training samples;
step ii, using an extraction model of time character pixels in the UNet training image;
and iii, suppressing the background of the image to be detected based on the mask obtained by the time character extraction model.
Step i comprises the following subdivision steps:
1-2) taking a batch of random video images as a background, drawing time characters of black and white colors and multiple fonts, and overlapping a substrate-free image as a training input sample;
Ω i ={0,1,...,K}
drawing K characters, wherein Ω i Representing the pixel range (pen pixel containing only the character) where the ith character is to be drawn, Ω 0 Representing background pixels other than character pixels.
I[x,y]Representing pixels [ x, y ] in an image]RGB average luminance value, Ω i Representing the number of background pixels covered by the ith character, and D (i) represents the average luminance of the covered background pixels of the ith character.
The rendering function for character generation is as follows:
using RGBA encoding, [0, 0] represents a colorless transparent pixel, [0, 1] represents one black pixel, and [1, 1] represents one white pixel.
f'=(1.0-α)ob+f
Mixing f and background image b, wherein alpha represents the image formed by transparency channels in f, o in the formula represents a matrix operator, and two matrixes with the same size are represented for element-by-element multiplication.
1-2) taking a black image with the same size as a background, and superposing time characters with white color and other content and characteristics on the same coordinate position as an input sample to serve as extraction target samples in one-to-one correspondence;
t=α
alpha is a white character image with a transparency channel median of 1 in the previous operation, and is directly taken as an extraction target sample t.
As shown in fig. 2, the upper and lower parts represent the input sample and the target sample, respectively.
Step ii comprises the following subdivision steps:
2-1) set feature extraction convolutional network structure
M sets of 'convolutions+pooled downsampling', where each set of convolutions has N layers and contains Batchnormal and ReLU operations;
in this embodiment, M is 4 and N is 2.
After M groups are pooled, a layer of single-layer convolution is used for adjusting the channel number so as to match with the subsequent up-sampling;
k sets of 'upsampling+convolution', where each set of convolutions has L layers;
in this embodiment, K is 4 and L is 2.
The up-sampled output matrix of each layer is sequentially connected to the output matrix of the corresponding down-sampled convolution layer;
after up-sampling of the K groups, a single layer convolution is used to reduce the number of channels to 1 for outputting the final features.
2-2) defining training parameters and outputting a segmentation model
The configuration of a convolution layer, namely outputting the number maps of channels, wherein the size k of a convolution kernel is s, and the packing is p;
in this embodiment, the downsampling starts at 64 and the upsampling ends at 64; k. s and p are 3*3, 1 and 1 respectively.
Pooling and up-sampling configuration, wherein the sliding Window size Window is s, and packing is p;
in this embodiment, window, s, p has values of 2×2, 2, and 0, respectively.
The length and width of the original input image must be integral multiples of a;
in this embodiment, a is 16.
A sigmoid activation function is used for the finally output prediction matrix, the value of the finally output prediction matrix is adjusted to be in the range of 0 to 1, and when a certain characteristic value is greater than a threshold value S, the position is indicated to be a character pixel;
in this embodiment, the threshold S is 0.5, and x is a prediction matrix output by the UNet network, where the value of the threshold S is in the real range.
Step iii comprises the following subdivision steps:
3-1) using a trained UNet time character extraction model to segment time characters and image backgrounds in an image to be detected, and obtaining a time character instance segmentation mask;
assuming that the original image is n, the resulting example segmentation mask is m.
If m [ i, j ] =1 (white) indicates that n [ i, j ] is a character pixel, and vice versa, it is a background pixel.
3-2) according to the mask shown in fig. 4, graying the background area of the non-time character strokes (pixels) in the original to-be-detected image by using a linear attenuation algorithm;
where n' represents the background attenuated image, where o represents the matrix operator, representing that two matrices of the same size are multiplied element by element.
The greater the attenuation control coefficient k > =0, the more pronounced the effect of attenuation;
in this embodiment k=8.
Finally, the original image is suppressed based on the time character mask, resulting in the output shown in fig. 5.
The training of the time character extraction model is completed, the time characters and the background area in the image to be detected can be subjected to example segmentation, the time character recognition degree is enhanced, and the technical scheme of the invention is realized.
It will be clear to those skilled in the art that the specific values of the parameters or thresholds described above may be adjusted according to the sample training method and the severity of the specification execution and do not constitute a limitation of the present invention.
Finally, it should be noted that the foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the invention, but rather the detailed description of the present invention is given in terms of the foregoing embodiment, and that modifications may be made to the technical solution described in the foregoing embodiment, or equivalents may be substituted for part of the technical features thereof, for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1. An enhancement method for overlapping time characters in a video image, comprising the steps of:
step i, customizing batch generation of UNet training samples;
step ii, using an extraction model of time character pixels in the UNet training image;
step iii, suppressing the background of the image to be detected based on the mask obtained by the time character extraction model, wherein the step iii comprises the steps of using a trained UNet time character extraction model to segment the time characters and the image background in the image to be detected to obtain a time character instance segmentation mask; assuming that the original image is n, and the obtained example segmentation mask is m; according to the mask, graying the background area of the non-time character strokes in the original to-be-detected image by using a linear attenuation algorithm;
wherein n' representsImage after background attenuation, in the formulaRepresenting a matrix operator, representing two matrices of equal size undergoing element-by-element multiplication.
2. The method for enhancing superimposed time characters in video images according to claim 1, wherein the step i, the specific step of customizing UNet batch training samples, comprises:
1-1) taking a batch of random video images as a background, drawing time characters of black and white colors and multiple fonts, and overlapping a substrate-free image as a training input sample;
1-2) taking black images with the same size as a background, and superposing time characters with white color and other content identical to the characteristics on the same coordinate position as an input sample to serve as extraction target samples in one-to-one correspondence.
3. The enhancement method for overlapping time characters in a video image according to claim 1, wherein the specific step of ii, using an extraction model of time character pixels in UNet training images, comprises:
2-1) set feature extraction convolutional network structure
M sets of 'convolutions+pooled downsampling', where each set of convolutions has N layers and contains Batchnormal and ReLU operations;
after M groups are pooled, a layer of single-layer convolution is used for adjusting the channel number so as to match with the subsequent up-sampling;
k sets of 'upsampling+convolution', where each set of convolutions has L layers;
the up-sampled output matrix of each layer is sequentially connected to the output matrix of the corresponding down-sampled convolution layer;
reducing the number of channels to 1 by using a layer of single-layer convolution after up-sampling of the K groups for outputting final characteristics;
2-2) defining training parameters and outputting a segmentation model
The configuration of the convolution layer, the output channel number maps, the convolution kernel size k, stride s,
padding is p;
pooling and up-sampling configuration, wherein the sliding Window size Window is s, and packing is p;
the length and width of the original input image must be integral multiples of a;
and using a sigmoid activation function to the finally output prediction matrix, adjusting the value of the finally output prediction matrix to be in the range of 0 to 1, and indicating that the position is a character pixel when a certain characteristic value is greater than a threshold value S.
CN202010422327.3A 2020-05-19 2020-05-19 Enhancement method for superimposed time character in video image Active CN111899202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010422327.3A CN111899202B (en) 2020-05-19 2020-05-19 Enhancement method for superimposed time character in video image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010422327.3A CN111899202B (en) 2020-05-19 2020-05-19 Enhancement method for superimposed time character in video image

Publications (2)

Publication Number Publication Date
CN111899202A CN111899202A (en) 2020-11-06
CN111899202B true CN111899202B (en) 2024-03-15

Family

ID=73207449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010422327.3A Active CN111899202B (en) 2020-05-19 2020-05-19 Enhancement method for superimposed time character in video image

Country Status (1)

Country Link
CN (1) CN111899202B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947529B (en) * 2021-10-14 2023-01-10 万翼科技有限公司 Image enhancement method, model training method, component identification method and related equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6366699B1 (en) * 1997-12-04 2002-04-02 Nippon Telegraph And Telephone Corporation Scheme for extractions and recognitions of telop characters from video data
JP2005234786A (en) * 2004-02-18 2005-09-02 Nippon Telegr & Teleph Corp <Ntt> Video keyword extraction method, device and program
CN101546427A (en) * 2008-02-27 2009-09-30 西门子电脑辅助诊断有限公司 Method of suppressing obscuring features in an image
CN107292854A (en) * 2017-08-02 2017-10-24 大连海事大学 Grayscale image enhancement method based on local singularity quantitative analysis
CN108805042A (en) * 2018-05-25 2018-11-13 武汉东智科技股份有限公司 The detection method that road area monitor video is blocked by leaf
CN109948510A (en) * 2019-03-14 2019-06-28 北京易道博识科技有限公司 A kind of file and picture example dividing method and device
CN110659574A (en) * 2019-08-22 2020-01-07 北京易道博识科技有限公司 Method and system for outputting text line contents after status recognition of document image check box
CN111079745A (en) * 2019-12-11 2020-04-28 中国建设银行股份有限公司 Formula identification method, device, equipment and storage medium
CN111126396A (en) * 2019-12-25 2020-05-08 北京科技大学 Image recognition method and device, computer equipment and storage medium
CN111160335A (en) * 2020-01-02 2020-05-15 腾讯科技(深圳)有限公司 Image watermarking processing method and device based on artificial intelligence and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6366699B1 (en) * 1997-12-04 2002-04-02 Nippon Telegraph And Telephone Corporation Scheme for extractions and recognitions of telop characters from video data
JP2005234786A (en) * 2004-02-18 2005-09-02 Nippon Telegr & Teleph Corp <Ntt> Video keyword extraction method, device and program
CN101546427A (en) * 2008-02-27 2009-09-30 西门子电脑辅助诊断有限公司 Method of suppressing obscuring features in an image
CN107292854A (en) * 2017-08-02 2017-10-24 大连海事大学 Grayscale image enhancement method based on local singularity quantitative analysis
CN108805042A (en) * 2018-05-25 2018-11-13 武汉东智科技股份有限公司 The detection method that road area monitor video is blocked by leaf
CN109948510A (en) * 2019-03-14 2019-06-28 北京易道博识科技有限公司 A kind of file and picture example dividing method and device
CN110659574A (en) * 2019-08-22 2020-01-07 北京易道博识科技有限公司 Method and system for outputting text line contents after status recognition of document image check box
CN111079745A (en) * 2019-12-11 2020-04-28 中国建设银行股份有限公司 Formula identification method, device, equipment and storage medium
CN111126396A (en) * 2019-12-25 2020-05-08 北京科技大学 Image recognition method and device, computer equipment and storage medium
CN111160335A (en) * 2020-01-02 2020-05-15 腾讯科技(深圳)有限公司 Image watermarking processing method and device based on artificial intelligence and electronic equipment

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Automatic time stamp extraction system for home videos;Pei Yin等;《2002 IEEE International Symposium on Circuits and Systems (ISCAS)》;1-4 *
Content Based Lecture Video Retrieval Using Speech and Video Text Information;Haojin Yang等;《IEEE Transactions on Learning Technologies》;第7卷(第2期);142-154 *
安防监控场景下时间戳文字识别;刘洋;《中国优秀硕士学位论文全文数据库 信息科技辑》;I138-350 *
彩色照片时间戳识别;鲍复民, 李爱国, 覃征;复旦学报(自然科学版)(05);914-917+922 *
视频图像文字检测综述;周东傲;林嘉宇;;计算机工程与科学(04);760-764 *

Also Published As

Publication number Publication date
CN111899202A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
US10614574B2 (en) Generating image segmentation data using a multi-branch neural network
CN106682629B (en) Identification algorithm for identity card number under complex background
CN105719247B (en) Single image to the fog method based on feature learning
CN111898606B (en) Night imaging identification method for superimposing transparent time characters in video image
CN109815948B (en) Test paper segmentation algorithm under complex scene
CN112990220B (en) Intelligent identification method and system for target text in image
CN115359370B (en) Remote sensing image cloud detection method and device, computer device and storage medium
CN113468996A (en) Camouflage object detection method based on edge refinement
CN106933579A (en) Image rapid defogging method based on CPU+FPGA
Sihotang Implementation of Gray Level Transformation Method for Sharping 2D Images
CN111899202B (en) Enhancement method for superimposed time character in video image
CN107563476B (en) Two-dimensional code beautifying and anti-counterfeiting method
CN116152173A (en) Image tampering detection positioning method and device
CN108171220A (en) Road automatic identifying method based on full convolutional neural networks Yu CRF technologies
CN113012068A (en) Image denoising method and device, electronic equipment and computer readable storage medium
CN114519788A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN108764287A (en) Object detection method and system based on deep learning and grouping convolution
CN110533027B (en) Text detection and identification method and system based on mobile equipment
CN115984133A (en) Image enhancement method, vehicle snapshot method, device and medium
CN113901913A (en) Convolution network for ancient book document image binaryzation
CN115439850A (en) Image-text character recognition method, device, equipment and storage medium based on examination sheet
CN111402223B (en) Transformer substation defect problem detection method using transformer substation video image
CN114155541A (en) Character recognition method and device, terminal equipment and storage medium
CN115100060A (en) Low-illumination traffic image enhancement method based on image enhancement model
CN112907605A (en) Data enhancement method for instance segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant