CN113920455A - Night video coloring method based on deep neural network - Google Patents
Night video coloring method based on deep neural network Download PDFInfo
- Publication number
- CN113920455A CN113920455A CN202111009898.5A CN202111009898A CN113920455A CN 113920455 A CN113920455 A CN 113920455A CN 202111009898 A CN202111009898 A CN 202111009898A CN 113920455 A CN113920455 A CN 113920455A
- Authority
- CN
- China
- Prior art keywords
- image
- coloring
- network
- full
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004040 coloring Methods 0.000 title claims abstract description 82
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000001514 detection method Methods 0.000 claims abstract description 20
- 230000004927 fusion Effects 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000003062 neural network model Methods 0.000 claims abstract description 6
- 238000005070 sampling Methods 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a night video coloring method based on a deep neural network, which comprises the following steps: s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance by using a target detection algorithm and generating a cut target image; s2, establishing a coloring network, and performing instance coloring and full image coloring by constructing two end-to-end training backbone networks including an instance coloring network and a full image coloring network; constructing two coloring network corresponding levels, and performing end-to-end training on the full convolution neural network; and S3, establishing a fusion module, selectively fusing the features extracted from the example coloring network and the full image coloring network, and finally obtaining the colored night video image. The invention finally obtains the colored night video image by inputting the video image to be processed, processing the image by a target detection network, a full convolution network of the example image coloring and the full image coloring and a fusion module.
Description
Technical Field
The invention relates to the technical field of image analysis of Computer Vision (Computer Vision), in particular to a night video coloring method based on a deep neural network.
Background
In recent years, with the continuous development of technologies in the field of computer vision, the target detection network model is also receiving more and more attention, and the technology of combining the target detection network model with the coloring network model also becomes one of the hot spots receiving attention, but most of the combinations are applied to coloring pictures, for example: for applications such as old photo restoration, there still exists a certain gap in video coloring technology, especially in the field of monitoring video coloring.
The automatic conversion of grayscale images into authentic color images is a subject of intense research in computer vision and graphics. However, predicting two missing channels from a given single-channel grayscale image is inherently an ill-posed problem. Furthermore, the shading task may be multi-modal, in that there are many possible options to shade an object, such as: an automobile may be white, black, red, etc. Thus, the coloring of images remains a challenging research problem.
At present, a plurality of night monitoring videos or black and white monitoring videos exist, the colors of the videos cannot be well presented, so that the targets cannot be well processed and become invalid data, and much inconvenience is brought to the technology in the aspect of computer vision. Moreover, the black and white video coloring has very high requirement on the prediction accuracy of itself.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a night video coloring method based on a deep neural network.
The invention is realized by adopting the following technical scheme: a night video coloring method based on a deep neural network comprises the following steps:
s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance and generating a cut target image by using a target detection algorithm, extracting the target image, outputting the characteristics of the extracted target image into a full-connection network for classification and regression, sending the classification and regression result into the full convolution network, and recovering the category of each pixel from the characteristics of the target image;
s2, establishing a coloring network, and performing instance coloring and full image coloring by constructing two end-to-end training backbone networks including an instance coloring network and a full image coloring network; constructing two coloring network corresponding levels, and performing end-to-end training on the full convolution neural network;
s3, establishing a fusion module, performing three-layer convolution on the features extracted from the example coloring network and the full image coloring network to obtain a full image weight map and an example weight map, changing the example image features and the example weight map into the size of the full image features according to coordinates, and finally performing weighted fusion on the full image features and each group of example image features according to corresponding weight maps to obtain a colored night video image.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention realizes the acquisition of the colored night video image by inputting the video image to be processed, processing the image by a target detection network, a full convolution network of the example image coloring and the full image coloring and a fusion module.
2. The method can be used for coloring night monitoring videos in various scenes, has good coloring effect and high accuracy, can well restore the original color of the scenes, ensures that black and white monitoring videos are no longer invalid and meaningless training data, expands the scale of a data set, has good ductility and plasticity, and can be applied to various scenes and fields.
3. The invention realizes instance perception coloring without processing complex background noise interference.
4. The present invention uses the located objects as input, allowing the instance coloring web learning objects and representations to accurately color and avoid color and background confusion.
Drawings
FIG. 1 is an overall framework of the present invention;
FIG. 2 is a diagram of an object detection framework of the present invention;
FIG. 3 is a target detection subject network architecture of the present invention;
FIG. 4 is a diagram of a colored network framework of the present invention;
fig. 5 is a fusion module framework diagram of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
As shown in fig. 1, the night video coloring method based on the deep neural network of the present embodiment mainly includes the following steps:
s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance and generating a cut target image by using a target detection algorithm, extracting the target image, outputting the extracted image characteristics, sending the output image characteristics into a full-connection network of 1024 neurons for classification and regression, sending the classification and regression results into a full convolution network, and recovering the category of each pixel from the abstract characteristics; the bottom layer of the pre-trained convolutional neural network detects low-level features including edges, corners and the like, and the higher layer detects higher-level features, such as: car, person, sky, etc.; after the forward propagation of the convolutional neural network, selecting high-level features by using an FPN method and transmitting the high-level features to a bottom layer to combine the features of each level with the high-level and low-level features; cutting out corresponding gray scale example images and color example images after detecting the boundary frame of each object, and converting the size of the cut images into 256x256 resolution;
s2, establishing a coloring network, and realizing the functions of example coloring and full image coloring by constructing two end-to-end training backbone networks, namely an example coloring network and a full image coloring network; constructing two coloring network corresponding levels, namely a full convolution neural network for end-to-end training;
s3, establishing a fusion module, performing three-layer convolution on the features extracted from the example coloring network and the full image coloring network to obtain a full image weight map and an example weight map, changing the example image features and the example weight map into the size of the full image features according to coordinates, and finally performing weighted fusion on the full image features and each group of example image features according to corresponding weight maps to finally obtain a colored night video image.
As shown in fig. 2, in this embodiment, the specific steps of establishing the target detection neural network model in step S1 are as follows:
s11, setting a main network layer, as shown in figure 3, using a training or predicted original picture as an input unit, carrying out multi-scale detection on the picture through three steps of bottom-to-top connection, top-to-bottom connection and transverse connection, and fusing the features of each level of the picture to enable the picture to have strong semantic information and strong spatial information at the same time so as to achieve a stronger feature learning effect;
and S12, setting a head network layer, dividing the feature map extracted and acquired in the step S11 into 7x7 grids, performing bilinear interpolation on each grid to obtain four points, performing maximum pooling after interpolation to obtain a final 7x7 feature map as input, and performing full convolution neural network processing to obtain a fixed size to complete Mask prediction.
In this embodiment, the specific processes of the bottom-up connection, the top-down connection, and the transverse connection in step S11 are as follows:
connecting from bottom to top, realizing the traditional characteristic extraction process, dividing the complete picture into 5 blocks according to the size of a characteristic diagram, and respectively defining the output of the last layer of 2, 3, 4 and 5 in the characteristic diagram as C2, C3, C4 and C5;
the upper sampling is carried out from the highest layer, and the effect of reducing the training parameters is realized by directly utilizing the nearest neighbor upper sampling;
transverse connection, namely fusing an up-sampling result in the top-down connection with a feature map with the same size generated by the bottom-up connection; and performing convolution operation and no-activation function operation on each layer of C2, C3, C4 and C5 by 1x1, reducing the number of channels, setting output channels as 256 channels, then performing addition operation on the 256 channels and the up-sampled feature map, and after fusion, processing the fused feature map by using a 3 x 3 convolution kernel to eliminate aliasing effect of up-sampling.
As shown in fig. 4, in this embodiment, the specific steps of establishing the coloring network in step S2 are as follows:
s21, establishing an example coloring network for coloring example images, cutting targets and coordinates obtained by using a pre-training detection network Mask RCNN as a target detector into example images, inputting the example images into the example coloring network, learning deep semantic information of the images through the convolutional neural network by establishing a convolutional neural network with a plurality of intermediate layers, including convolutional layers, pooling layers, ReLu layers and batch normalization processing, and finally obtaining predicted color example images;
s22, establishing a full image coloring network for coloring the complete image, wherein the structure of the full image coloring network is similar to that of an example coloring network, and the consistent network structure is adopted, so that the corresponding levels of the two subsequent coloring networks are convenient to fuse, then inputting the original gray image into the full image coloring network, extracting the full image characteristics after the multi-layer network structure is carried out, and finally obtaining the predicted complete image.
In this embodiment, the concrete process of the example image coloring and the complete image coloring is as follows: on the basis of a gray level map, a neural network is guided in a point input color information mode and is directly mapped into a full convolution neural network, so that a reasonable color result is quickly generated, and the full convolution neural network can be colored by fusing low-level clues and high-level semantic information after being learned from a large amount of data.
As shown in fig. 5, in this embodiment, the specific steps of establishing the fusion module in step S3 are as follows:
s31, performing feature extraction and convolution, taking the full image feature Fx and the example image feature Fxi after the whole image is colored as input, and performing three-layer convolution extraction to obtain a full image weight map WF and an example image weight map WIi;
s32, adjusting the example image feature image, and converting the example image feature Fxi and the example image weight map WIi into the size of the full image feature Fx through stretching according to specific coordinates in a zero padding mode;
s33, performing weighted fusion, and performing Softmax weighted fusion on the full image feature Fx and the group of example image features Fxi according to the corresponding weight map for each pixel, wherein the specific formula is as follows:
where N is the number of instances.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (7)
1. A night video coloring method based on a deep neural network is characterized by comprising the following steps:
s1, establishing a target detection neural network model, inputting a video image to be processed, detecting a target instance and generating a cut target image by using a target detection algorithm, extracting the target image, outputting the characteristics of the extracted target image into a full-connection network for classification and regression, sending the classification and regression result into the full convolution network, and recovering the category of each pixel from the characteristics of the target image;
s2, establishing a coloring network, and performing instance coloring and full image coloring by constructing two end-to-end training backbone networks including an instance coloring network and a full image coloring network; constructing two coloring network corresponding levels, and performing end-to-end training on the full convolution neural network;
s3, establishing a fusion module, performing three-layer convolution on the features extracted from the example coloring network and the full image coloring network to obtain a full image weight map and an example weight map, changing the example image features and the example weight map into the size of the full image features according to coordinates, and finally performing weighted fusion on the full image features and each group of example image features according to corresponding weight maps to obtain a colored night video image.
2. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific steps of establishing the target detection neural network model in step S1 are as follows:
s11, setting a main network layer, wherein the main network framework takes a trained or predicted original picture as an input unit, performs multi-scale detection on the picture through three steps of bottom-to-top connection, top-to-bottom connection and transverse connection, and fuses the characteristics of each level of the picture;
and S12, setting a head network layer, dividing the feature map extracted and acquired in the step S11 into 7x7 grids, performing bilinear interpolation on each grid to obtain four points, performing maximum pooling after interpolation to obtain a final 7x7 feature map as input, and performing full convolution neural network processing to obtain a fixed size to complete Mask prediction.
3. The night video coloring method based on the deep neural network as claimed in claim 2, wherein the detailed process of the three steps of bottom-up connection, top-down connection and horizontal connection in the step S11 is as follows:
connecting from bottom to top, dividing the complete picture into 5 blocks according to the size of the feature map, wherein the outputs of the last layers of 2, 3, 4 and 5 in the feature map are respectively defined as C2, C3, C4 and C5;
the up-sampling is carried out from the highest layer, and the up-sampling utilizes nearest neighbor up-sampling;
transverse connection, namely fusing an up-sampling result in the top-down connection with a feature map with the same size generated by the bottom-up connection; and performing convolution operation and no-activation function operation on each layer of C2, C3, C4 and C5 by 1x1, setting an output channel as 256 channels, then performing summation operation on the 256 channels and the up-sampled feature map, and processing the fused feature map by using 3 x 3 convolution kernel after fusion so as to eliminate the aliasing effect of up-sampling.
4. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific process of detecting the target instance and generating the cropped target image is as follows: after the bounding box of each object is detected, the corresponding gray scale example image and color example image are cropped, and the cropped image is converted to 256 × 256 resolution.
5. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific steps of establishing the coloring network in step S2 are as follows:
s21, establishing an example coloring network for example image coloring, acquiring a target and coordinates by using a pre-training detection network Mask RCNN as a target detector, cutting the target and the coordinates into an example image, inputting the example image into the example coloring network, establishing a convolutional neural network with a plurality of intermediate layers, including a convolutional layer, a pooling layer, a ReLu layer and batch normalization processing, learning deep semantic information of the image through the convolutional neural network, and finally obtaining a predicted color example image;
and S22, establishing a full image coloring network for coloring the complete image, inputting the original image into the full image coloring network, extracting the full image characteristics after the full image coloring network passes through a multi-layer network structure, and finally obtaining the predicted complete image.
6. The night video coloring method based on the deep neural network as claimed in claim 5, wherein the specific process of example image coloring and complete image coloring is as follows: on the basis of a gray level map, a neural network is guided in a point input color information mode and is directly mapped into a full convolution neural network, and the full convolution neural network is colored by fusing low-level clues and high-level semantic information after being learned from a large amount of data.
7. The night video coloring method based on the deep neural network as claimed in claim 1, wherein the specific steps of establishing the fusion module in step S3 are as follows:
s31, performing feature extraction and convolution, taking the full image feature Fx and the example image feature Fxi after the whole image is colored as input, and performing three-layer convolution extraction to obtain a full image weight map WF and an example image weight map WIi;
s32, adjusting the example image feature image, and converting the example image feature Fxi and the example image weight map WIi into the size of the full image feature Fx through stretching according to specific coordinates in a zero padding mode;
s33, performing weighted fusion, and performing Softmax weighted fusion on the full image feature Fx and the group of example image features Fxi according to the corresponding weight map for each pixel, wherein the specific formula is as follows:
where N is the number of instances.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111009898.5A CN113920455B (en) | 2021-08-31 | 2021-08-31 | Night video coloring method based on deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111009898.5A CN113920455B (en) | 2021-08-31 | 2021-08-31 | Night video coloring method based on deep neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113920455A true CN113920455A (en) | 2022-01-11 |
CN113920455B CN113920455B (en) | 2024-08-06 |
Family
ID=79233524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111009898.5A Active CN113920455B (en) | 2021-08-31 | 2021-08-31 | Night video coloring method based on deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113920455B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115100681A (en) * | 2022-06-24 | 2022-09-23 | 暨南大学 | Clothes identification method, system, medium and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107067452A (en) * | 2017-02-20 | 2017-08-18 | 同济大学 | A kind of film 2D based on full convolutional neural networks turns 3D methods |
CN109584248A (en) * | 2018-11-20 | 2019-04-05 | 西安电子科技大学 | Infrared surface object instance dividing method based on Fusion Features and dense connection network |
CN110651301A (en) * | 2017-05-24 | 2020-01-03 | 黑拉有限责任两合公司 | Method and system for automatically coloring night vision images |
CN112417190A (en) * | 2020-11-27 | 2021-02-26 | 暨南大学 | Retrieval method and application of ciphertext JPEG image |
-
2021
- 2021-08-31 CN CN202111009898.5A patent/CN113920455B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107067452A (en) * | 2017-02-20 | 2017-08-18 | 同济大学 | A kind of film 2D based on full convolutional neural networks turns 3D methods |
CN110651301A (en) * | 2017-05-24 | 2020-01-03 | 黑拉有限责任两合公司 | Method and system for automatically coloring night vision images |
US20200167972A1 (en) * | 2017-05-24 | 2020-05-28 | HELLA GmbH & Co. KGaA | Method and system for automatically colorizing night-vision images |
CN109584248A (en) * | 2018-11-20 | 2019-04-05 | 西安电子科技大学 | Infrared surface object instance dividing method based on Fusion Features and dense connection network |
CN112417190A (en) * | 2020-11-27 | 2021-02-26 | 暨南大学 | Retrieval method and application of ciphertext JPEG image |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115100681A (en) * | 2022-06-24 | 2022-09-23 | 暨南大学 | Clothes identification method, system, medium and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113920455B (en) | 2024-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110956094B (en) | RGB-D multi-mode fusion personnel detection method based on asymmetric double-flow network | |
CN112163449B (en) | Lightweight multi-branch feature cross-layer fusion image semantic segmentation method | |
CN112733950A (en) | Power equipment fault diagnosis method based on combination of image fusion and target detection | |
JP4708909B2 (en) | Method, apparatus and program for detecting object of digital image | |
CN112801027B (en) | Vehicle target detection method based on event camera | |
US20200279166A1 (en) | Information processing device | |
CN107273870A (en) | The pedestrian position detection method of integrating context information under a kind of monitoring scene | |
CN114155527A (en) | Scene text recognition method and device | |
CN113762409A (en) | Unmanned aerial vehicle target detection method based on event camera | |
JP7490359B2 (en) | Information processing device, information processing method, and program | |
CN114399734A (en) | Forest fire early warning method based on visual information | |
CN111582074A (en) | Monitoring video leaf occlusion detection method based on scene depth information perception | |
CN111861880A (en) | Image super-fusion method based on regional information enhancement and block self-attention | |
CN116342894B (en) | GIS infrared feature recognition system and method based on improved YOLOv5 | |
CN111079864A (en) | Short video classification method and system based on optimized video key frame extraction | |
CN116895098A (en) | Video human body action recognition system and method based on deep learning and privacy protection | |
CN116645592A (en) | Crack detection method based on image processing and storage medium | |
CN114627269A (en) | Virtual reality security protection monitoring platform based on degree of depth learning target detection | |
CN114549391A (en) | Circuit board surface defect detection method based on polarization prior | |
CN116030361A (en) | CIM-T architecture-based high-resolution image change detection method | |
CN111881915A (en) | Satellite video target intelligent detection method based on multiple prior information constraints | |
CN115019340A (en) | Night pedestrian detection algorithm based on deep learning | |
CN113920455B (en) | Night video coloring method based on deep neural network | |
US11481919B2 (en) | Information processing device | |
CN117789077A (en) | Method for predicting people and vehicles for video structuring in general scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |