CN111899202B

CN111899202B - Enhancement method for superimposed time character in video image

Info

Publication number: CN111899202B
Application number: CN202010422327.3A
Authority: CN
Inventors: 聂晖; 杨小波; 李军
Original assignee: Wuhan Eastwit Technology Co ltd
Current assignee: Wuhan Eastwit Technology Co ltd
Priority date: 2020-05-19
Filing date: 2020-05-19
Publication date: 2024-03-15
Anticipated expiration: 2040-05-19
Also published as: CN111899202A

Abstract

The invention belongs to the field of computer vision, and particularly relates to a method for identifying and enhancing time annotation information of a video image. The invention comprises the following steps: training UNet (an image segmentation neural network) to realize a pixel-level time character extraction model in an image; and graying and suppressing the background of the original image on the image to be detected by means of the time character extraction model so as to enhance the recognizability of the recognized time character. Aiming at character features in a natural scene monitoring image, the invention realizes a method for enhancing time annotation information, and solves the problem to be solved when identifying the overlapping time characters of the 'no substrate' of the video image. The invention focuses on 'separation-inhibition' processing of superimposed characters and image background, and is an image enhancement technology with great application value in the field of scene character recognition.

Description

Enhancement method for superimposed time character in video image

Technical Field

The invention belongs to the field of computer vision, and is suitable for detecting superimposed time characters in pictures of video monitoring systems in public security and related industries. In particular to a method for identifying and enhancing time annotation information of video images.

Background

Along with the development of social security management, the identification of time annotation information in a large number of video monitoring images has remarkable and special application value for technical detection work in the public security industry, and is also one of examination contents of public security department on operation and maintenance work of national video image networking application platforms.

According to the implementation requirements of GA/T751-2008 video image text label specification, the time characters superimposed in the natural scene image cannot mask the background by using the 'substrate' image blocks. It is easy to understand that the characters are superimposed on the random scene image monitored outdoors, the background is visible between the strokes and the gaps of the single character and between the adjacent character intervals, the random distribution of illumination and the interference of background trivial objects are extremely easy, and great difficulty is brought to the recognition of time characters.

Disclosure of Invention

The invention aims to solve the technical problem of providing a character enhancement technical scheme aiming at the time characters overlapped by 'no substrate' in a natural scene image, and solves the problem of identification of video image time annotation information in the prior art.

In order to solve the technical problems, the basic technical concept of the invention is that training UNet (an image segmentation neural network) realizes a pixel-level time character extraction model in an image; and graying and suppressing the background of the original image on the image to be detected by means of the time character extraction model so as to enhance the recognizability of the recognized time character.

Therefore, the invention provides an enhancement method for overlapping time characters in a video image, which comprises the following steps:

step i, customizing batch generation of UNet training samples;

step ii, using an extraction model of time character pixels in the UNet training image;

and iii, suppressing the background of the image to be detected based on the mask obtained by the time character extraction model.

Preferably, the step i, the specific step of customizing the UNet batch training samples includes:

1-1) taking a batch of random video images as a background, drawing time characters of black and white colors and multiple fonts, and overlapping a substrate-free image as a training input sample;

1-2) taking a black image with the same size as a background, and superposing time characters with white color and other content and characteristics on the same coordinate position as an input sample to serve as extraction target samples in one-to-one correspondence;

preferably, the specific step of extracting the model of the time character pixels in the UNet training image in the step ii includes:

2-1) set feature extraction convolutional network structure

M sets of 'convolutions+pooled downsampling', where each set of convolutions has N layers and contains Batchnormal and ReLU operations;

after M groups are pooled, a layer of single-layer convolution is used for adjusting the channel number so as to match with the subsequent up-sampling;

k sets of 'upsampling+convolution', where each set of convolutions has L layers;

the up-sampled output matrix of each layer is sequentially connected to the output matrix of the corresponding down-sampled convolution layer;

after up-sampling of the K groups, a single layer convolution is used to reduce the number of channels to 1 for outputting the final features.

2-2) defining training parameters and outputting a segmentation model

The configuration of a convolution layer, namely outputting the number maps of channels, wherein the size k of a convolution kernel is s, and the packing is p;

pooling and up-sampling configuration, wherein the sliding Window size Window is s, and packing is p;

the length and width of the original input image must be integral multiples of a;

a sigmoid activation function is used for the finally output prediction matrix, the value of the finally output prediction matrix is adjusted to be in the range of 0 to 1, and when a certain characteristic value is greater than a threshold value S, the position is indicated to be a character pixel;

preferably, the step iii, the specific step of suppressing the background of the image to be detected based on the mask obtained by the time character extraction model includes:

3-1) using a trained UNet time character extraction model to segment time characters and image backgrounds in an image to be detected, and obtaining a time character instance segmentation mask;

3-2) according to the mask, carrying out graying treatment on the background area of the non-time character strokes (pixels) in the original to-be-detected image by using a linear attenuation algorithm.

The training of the time character extraction model is completed, the time characters and the background area in the image to be detected can be subjected to example segmentation, the time character recognition degree is enhanced, and the technical purpose of the invention is achieved.

The beneficial effects of the invention include:

1) Aiming at character features in a natural scene monitoring image, the method realizes a time annotation information enhancement method, and solves the problem to be solved when the overlapping time characters of the 'no substrate' of the video image are identified.

2) The invention focuses on 'separation-inhibition' processing of superimposed characters and image background, and is an image enhancement technology with great application value in the field of scene character recognition.

Drawings

The technical scheme of the invention is further specifically described below with reference to the accompanying drawings and the detailed description.

FIG. 1 is a basic flow chart of the method of the present invention;

FIG. 2 is a diagram of training input samples and extracted target samples for a UNet network extraction model;

fig. 3 is an example of an original image of a UNet network extraction model for time characters in the original image to be inspected;

FIG. 4 is an example of a mask of a UNet network extraction model to time characters in an original image to be inspected;

fig. 5 is an example of a mask-based background suppression map of a UNet network extraction model to time characters in an original image to be inspected.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, the present invention proposes an overall flowchart of an enhancement method for overlapping time characters in a video image, which mainly includes the following steps:

step i, customizing to generate UNet batch training samples;

Step i comprises the following subdivision steps:

1-2) taking a batch of random video images as a background, drawing time characters of black and white colors and multiple fonts, and overlapping a substrate-free image as a training input sample;

Ω _i ＝{0,1，...，K}

drawing K characters, wherein Ω _i Representing the pixel range (pen pixel containing only the character) where the ith character is to be drawn, Ω ₀ Representing background pixels other than character pixels.

I[x,y]Representing pixels [ x, y ] in an image]RGB average luminance value, Ω _i Representing the number of background pixels covered by the ith character, and D (i) represents the average luminance of the covered background pixels of the ith character.

The rendering function for character generation is as follows:

using RGBA encoding, [0, 0] represents a colorless transparent pixel, [0, 1] represents one black pixel, and [1, 1] represents one white pixel.

f'＝(1.0-α)ob+f

Mixing f and background image b, wherein alpha represents the image formed by transparency channels in f, o in the formula represents a matrix operator, and two matrixes with the same size are represented for element-by-element multiplication.

t＝α

alpha is a white character image with a transparency channel median of 1 in the previous operation, and is directly taken as an extraction target sample t.

As shown in fig. 2, the upper and lower parts represent the input sample and the target sample, respectively.

Step ii comprises the following subdivision steps:

2-1) set feature extraction convolutional network structure

in this embodiment, M is 4 and N is 2.

in this embodiment, K is 4 and L is 2.

2-2) defining training parameters and outputting a segmentation model

in this embodiment, the downsampling starts at 64 and the upsampling ends at 64; k. s and p are 3*3, 1 and 1 respectively.

in this embodiment, window, s, p has values of 2×2, 2, and 0, respectively.

in this embodiment, a is 16.

in this embodiment, the threshold S is 0.5, and x is a prediction matrix output by the UNet network, where the value of the threshold S is in the real range.

Step iii comprises the following subdivision steps:

assuming that the original image is n, the resulting example segmentation mask is m.

If m [ i, j ] =1 (white) indicates that n [ i, j ] is a character pixel, and vice versa, it is a background pixel.

3-2) according to the mask shown in fig. 4, graying the background area of the non-time character strokes (pixels) in the original to-be-detected image by using a linear attenuation algorithm;

where n' represents the background attenuated image, where o represents the matrix operator, representing that two matrices of the same size are multiplied element by element.

The greater the attenuation control coefficient k > =0, the more pronounced the effect of attenuation;

in this embodiment k=8.

Finally, the original image is suppressed based on the time character mask, resulting in the output shown in fig. 5.

The training of the time character extraction model is completed, the time characters and the background area in the image to be detected can be subjected to example segmentation, the time character recognition degree is enhanced, and the technical scheme of the invention is realized.

It will be clear to those skilled in the art that the specific values of the parameters or thresholds described above may be adjusted according to the sample training method and the severity of the specification execution and do not constitute a limitation of the present invention.

Finally, it should be noted that the foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the invention, but rather the detailed description of the present invention is given in terms of the foregoing embodiment, and that modifications may be made to the technical solution described in the foregoing embodiment, or equivalents may be substituted for part of the technical features thereof, for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An enhancement method for overlapping time characters in a video image, comprising the steps of:

step i, customizing batch generation of UNet training samples;

step iii, suppressing the background of the image to be detected based on the mask obtained by the time character extraction model, wherein the step iii comprises the steps of using a trained UNet time character extraction model to segment the time characters and the image background in the image to be detected to obtain a time character instance segmentation mask; assuming that the original image is n, and the obtained example segmentation mask is m; according to the mask, graying the background area of the non-time character strokes in the original to-be-detected image by using a linear attenuation algorithm;

wherein n' representsImage after background attenuation, in the formulaRepresenting a matrix operator, representing two matrices of equal size undergoing element-by-element multiplication.

2. The method for enhancing superimposed time characters in video images according to claim 1, wherein the step i, the specific step of customizing UNet batch training samples, comprises:

1-2) taking black images with the same size as a background, and superposing time characters with white color and other content identical to the characteristics on the same coordinate position as an input sample to serve as extraction target samples in one-to-one correspondence.

3. The enhancement method for overlapping time characters in a video image according to claim 1, wherein the specific step of ii, using an extraction model of time character pixels in UNet training images, comprises:

2-1) set feature extraction convolutional network structure

reducing the number of channels to 1 by using a layer of single-layer convolution after up-sampling of the K groups for outputting final characteristics;

2-2) defining training parameters and outputting a segmentation model

The configuration of the convolution layer, the output channel number maps, the convolution kernel size k, stride s,

padding is p;

and using a sigmoid activation function to the finally output prediction matrix, adjusting the value of the finally output prediction matrix to be in the range of 0 to 1, and indicating that the position is a character pixel when a certain characteristic value is greater than a threshold value S.