CN112258537B

CN112258537B - Method for monitoring dark vision image edge detection based on convolutional neural network

Info

Publication number: CN112258537B
Application number: CN202011161185.6A
Authority: CN
Inventors: 赵志强; 张琴; 陶于祥; 陈阔; 钱鹰; 黄颖; 何帆; 王少志; 徐晓文
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2022-08-26
Anticipated expiration: 2040-10-27
Also published as: CN112258537A

Abstract

The invention belongs to the field of image processing, and particularly relates to a method for detecting edges of a supervised scotopic vision image based on a convolutional neural network, which comprises the following steps: acquiring a scotopic vision image, inputting the scotopic vision image into a trained supervision image edge detection model for performing scotopic vision image edge detection to obtain an edge detection result; the monitoring image edge detection model is an optimized monitoring image edge detection model and consists of six edge detection modules and a splicing module; according to the invention, by adding the convolution layer of the model and introducing the residual error structure unit, the characteristics and edge detail information of one-stage learning on the image can be better reserved, so that the trained optimization model can further improve the image edge detection effect, the continuity of the image edge is enhanced, and the output edge detection image is more in line with the human eye observation effect.

Description

Method for monitoring dark vision image edge detection based on convolutional neural network

Technical Field

The invention belongs to the field of image processing, computer vision and deep learning, and particularly relates to a method for detecting edges of a supervised scotopic vision image based on a convolutional neural network.

Background

Edge detection is a classic problem in the field of image and computer vision, is a pre-storage work of a plurality of problems, such as image segmentation and image recognition, and is widely applied to a plurality of fields, such as medical images and image positioning. Although edge detection methods and algorithms have achieved some success, it is still worth further exploration.

The original purpose of image edge detection is to extract the edge or contour of an object, so that the subsequent application is facilitated. However, the two tasks of extracting the edge and obtaining the detail information are often contradictory, and different tasks are different, so different edge detection schemes need to be proposed according to different target tasks. In deep learning, in order to obtain a better training result, the data samples in the training set are often required to be distributed more uniformly. However, in most cases, the edge point pixels in the image only occupy a small portion of the image, which results in extreme imbalance between positive and negative samples in the training set, and brings a certain difficulty to the training of the deep learning model. Especially, the contrast of the image in a scotopic vision environment is lower, and a more serious challenge is brought to the image edge extraction.

Disclosure of Invention

In order to solve the problems of the prior art, the invention provides a method for detecting the edge of a supervised scotopic vision image based on a convolutional neural network, which comprises the following steps: acquiring a scotopic vision image, and inputting the scotopic vision image into a trained optimized supervised image edge detection model to perform scotopic vision image edge detection to obtain an edge detection result; the optimized supervised image edge detection model consists of six edge detection modules and a splicing module;

the process of training the optimization supervision image edge detection model comprises the following steps:

s1: acquiring a scotopic vision original image data set and an edge labeling image data set of the same scene under a normal illumination condition, dividing the data sets into a training set and a testing set, and simultaneously performing data enhancement processing on a scotopic vision original image in the training set and an edge labeling image of the same scene under the normal illumination condition to obtain an amplified training sample set;

s2: inputting the images in the amplified training set into an optimized supervised image edge detection model to obtain an effect process diagram of six edge detection modules;

s3: splicing the process graphs with the six edge detection effects, converting the spliced images into three-dimensional images, and performing convolution on the three-dimensional images to obtain edge detection images;

s4: calculating a loss function of the optimized supervised image edge detection model, and calculating the error between the obtained edge detection image in the training process and the edge marking image of the same scene under the normal illumination condition according to the loss function;

s5: continuously adjusting the weight of the loss function, and saving the training weight parameter of the model when the value of the loss function is minimum;

s6: inputting the data in the test set into an optimized supervised image edge detection model for testing;

s7: and outputting an edge detection result, and finishing the model training.

Preferably, the process of scotopic vision image data set comprises: extracting R, G, B channels of the images under normal illumination to obtain R1, G1 and B1 images; linearly changing the gray levels of the R1, G1 and B1 images to 0-47 to obtain R2, G2 and B2 images; recombining the R2, G2 and B2 images to obtain an image in a dark visual environment; and carrying out set processing on the recombined images to obtain a scotopic vision image data set.

Preferably, the structure of the six edge detection modules includes:

the first module comprises two convolution layers, the number of convolution kernels of the first layer is 32, the size of a convolution window is 3 x 3, the step size is 2 x 2, and an activation function is a relu function; the number of the convolution kernels of the second layer is 64, the size of a convolution window is 3 x 3, the step size is 1 x 1, and the activation function is a relu function;

the second module comprises two convolution layers, the number of convolution kernels of the first layer is 128, the size of a convolution window is 3 x 3, the step size is 1 x 1, and the activation function is a relu function; the number of the convolution kernels in the second layer is 128, the size of a convolution window is 3 x 3, and the step size is 1 x 1;

the third module comprises a convolution layer and two same convolution structures; the number of convolution kernels of the convolution layer is 256, the size of a convolution window is 1 x 1, and the step size is 1 x 1; the convolution structure is: the number of convolution kernels is 256, the size of a convolution window is 3 x 3, the step length is 1 x 1, and the activation function is a relu function;

the fourth module comprises a convolution layer and three same convolution structures; the number of convolution kernels of the convolution layer is 256, the size of a convolution window is 1 x 1, and the step size is 2 x 2; the convolution structure is: the number of convolution kernels of the first convolution layer is 256, the size of a convolution window is 1 x 1, the step size is 1 x 1, and the activation function is a relu function; the number of convolution kernels of the second convolution layer is 512, the size of a convolution window is 3 x 3, and the step size is 1 x 1;

the fifth module comprises a convolution layer and three same convolution structures; the number of convolution kernels of the convolution layer is 512, the size of a convolution window is 1 x 1, and the step size is 2 x 2; the convolution structure is: the number of convolution kernels of the first convolution layer is 256, the size of a convolution window is 1 x 1, the step size is 1 x 1, and the activation function is a relu function; the number of convolution kernels of the second convolution layer is 512, the size of a convolution window is 3 x 3, and the step size is 1 x 1;

the sixth module comprises a convolution layer and three same convolution structures; the number of convolution kernels of the convolution layer is 256, the size of a convolution window is 1 x 1, and the step size is 1 x 1; the convolution structure is: a relu activation function, wherein the number of convolution kernels of the first convolution layer is 128, the size of a convolution window is 1 x 1, the step size is 1 x 1, and the activation function is a relu function; the number of convolution kernels for the second convolution layer is 256, the convolution window size is 3 x 3, and the step size is 1 x 1.

Preferably, the splicing module is used for splicing the edge detection effect process diagrams obtained by the six edge detection modules; converting the spliced image into a three-dimensional image by adopting a contit function, and performing convolution on the three-dimensional image; the number of convolution kernels to be convolved is 1, the size of the convolution window is 1 x 1, and the step size is 1 x 1.

Preferably, the image in the scotopic vision image dataset is subjected to enhancement processing, including random cropping of the image and flip rotation processing.

Preferably, the loss function of the optimized supervised image edge detection model is:

further, the coefficient of the loss function is optimized, and the optimization formula of the coefficient of the loss function is as follows:

furthermore, the weight lambda of the positive sample and the negative sample is set, the initial value of the lambda is set to be 0.6-1.2, and the initial value is continuously updated through the training of the model.

Preferably, the process of calculating the error between the edge detection image and the edge labeling image of the same scene under the normal illumination condition is as follows: acquiring an edge point number set of edge marking images of the same scene under a normal illumination condition; acquiring an edge point number set of an edge detection model image; calculating the error of the edge detection image and the edge marking image of the same scene under the normal illumination condition according to an error formula;

the error formula is:

the invention has the beneficial effects that the invention provides a monitoring edge detection method based on the convolutional neural network, which is more suitable for the edge detection of the scotopic vision image. According to the invention, by adding the convolution layer of the model and introducing the residual error structure unit, the characteristics and edge detail information of one-stage learning on the image can be better reserved, so that the trained model can further improve the image edge detection effect, the PSNR and MSE evaluation indexes are improved, the continuity of the image edge is enhanced, and the edge detection image output by the model is more in line with the human eye observation effect.

Drawings

FIG. 1 is a schematic flow chart of the supervised image edge detection method based on convolutional neural network of the present invention;

FIG. 2 is an overall model network structure of the convolutional neural network-based supervised image edge detection method of the present invention;

FIG. 3 is a graph of the validation effect of the test set in the disclosed edge annotation dataset;

fig. 4 is a diagram of the edge detection effect of the image captured in the actual dark vision environment.

Detailed Description

The technical solutions and advantages of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and the present invention will be further described in detail. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

A method for detecting edges of a supervised scotopic vision image based on a convolutional neural network, as shown in fig. 1, the method comprising: acquiring a scotopic vision image, inputting the scotopic vision image into a trained optimized supervised image edge detection model for performing scotopic vision image edge detection to obtain a detection result; the optimized supervised image edge detection model consists of six edge detection modules and a splicing module;

s1: acquiring a scotopic vision original image data set and an edge label image data set of the same scene under a normal illumination condition, dividing the data sets into a training set and a testing set, and simultaneously performing data enhancement processing on a scotopic vision original image in the training set and an edge label image of the same scene under the normal illumination condition to obtain an amplified training sample set;

s2: inputting the images in the augmented training set into an optimized supervised image edge detection model;

s3: obtaining effect process diagrams of six edge detection modules, splicing the process diagrams of six edge detection effects, converting spliced images into three-dimensional images, and performing convolution on the three-dimensional images to obtain edge detection images;

s7: and outputting an edge detection result, and finishing the model training.

Preferably, a scotopic vision original image dataset and an edge labeling image dataset of the same scene under normal illumination conditions are obtained, the datasets are divided into a training set and a testing set, and meanwhile, some images shot in an actual scotopic vision environment are prepared to serve as verification sets.

Specifically, a public BIPED edge labeling data set is selected to be subjected to linear change to obtain a BIPED data set in a dark visual environment, and the BIPED data set is called a scotopic _ BIPED data set; the training set comprises 200 original images and corresponding edge labeling images, the test set comprises 50 original images and corresponding edge labeling images, and 50 images shot in an actual dark vision environment are collected as a verification set.

Preferably, the process of acquiring the scotopic vision image dataset further comprises: r, G, B channels of the images under normal illumination are extracted to obtain R1, G1 and B1 images; linearly changing the gray levels of the R1, G1 and B1 images to 0-47 to obtain R2, G2 and B2 images; recombining the R2, G2 and B2 images to obtain an image under a dark vision environment; and carrying out set processing on the recombined images to obtain a scotopic vision image data set.

The structure of six edge detection modules of the optimized supervised image edge detection model comprises:

the first module comprises two convolution layers, the number of convolution kernels of the first layer is 32, the size of a convolution window is 3 x 3, the step size is 2 x 2, and an activation function is a relu function; the number of convolution kernels in the second layer is 64, the size of a convolution window is 3 x 3, the step size is 1 x 1, and the activation function is a relu function;

And (3) carrying out enhancement processing on the image in the scotopic vision image data set, wherein the enhancement processing comprises random cropping, image turning and rotation processing on the image. By the method, the training sample set in the data set can be expanded.

As shown in fig. 2, the process of inputting the image data in the data set into the optimized supervised image edge detection model, wherein each structure of the model processes the image comprises the following steps:

1) a first edge detection module: the input image is a scotopic vision image subjected to data enhancement processing, the scotopic vision image passes through two convolution layers, the number of convolution kernels of a first layer is 32, the size of a convolution window is 3 x 3, the step size is 2 x 2, the filling mode is SAME, then normalization is carried out, the activation function is a relu function, the number of convolution kernels of a second layer is 64, the size of the convolution window is 3 x 3, the step size is 1 x 1, the filling mode is SAME, then normalization is carried out, and the activation function is a relu function, so that an image conv1_1 is obtained. Then, the image conv1_1 is up-sampled, the number of convolution kernels is 1, the size of a convolution window is 1 × 1, the sampling size is 2, and the step size is 1 × 1, so that a result output image output1 of the first module is obtained.

Meanwhile, convolution operation is carried out on the image conv1_1, the number of convolution kernels is 128, the size of a convolution window is 1 x 1, the step size is 2 x 2, the filling mode is SAME, and then normalization is carried out to obtain an image rconv1_ 1.

2) A second edge detection module: the input image is the conv1_1 image of the step 1), the number of convolution kernels of the first layer is 128, the size of a convolution window is 3 × 3, the step size is 1 × 1, the filling mode is SAME, then normalization is carried out, the activation function is a relu function, the number of convolution kernels of the second layer is 128, the size of the convolution window is 3 × 3, the step size is 1 × 1, the filling mode is SAME, and then normalization is carried out, so that the image conv2_1 is obtained. Then, the image conv2_1 is up-sampled, the number of convolution kernels is 1, the size of the convolution window is 1 × 1, the sampling size is 2, and the step size is 1 × 1, so that the result output map output2 of the second module is obtained.

Meanwhile, the image conv2_1 is convolved again, the number of convolution kernels is 128, the size of a convolution window is 3 × 3, the step size is 2 × 2, the filling mode is SAME, an image conv2_2 is obtained, and then the image add2_1 is obtained by adding the conv2_2 and the rconv1_1 image in the step 1). And performing convolution on add2_1, wherein the number of convolution kernels is 256, the size of a convolution window is 1 × 1, the step size is 2 × 2, the filling mode is SAME, and then performing normalization to obtain an image rconv2_ 1.

3) A third edge detection module: the input image is the conv2_2 image in the step 2), the number of convolution kernels is 256, the size of a convolution window is 1 × 1, the step size is 1 × 1, the filling mode is SAME, and then normalization is performed to obtain an image conv3_ 1. Processing the image add2_1 in the step 2), and performing 2 cycles, wherein the structure in the cycle is as follows: the image add2_1 firstly passes through a relu activation function and then passes through two convolution layers, the number of convolution kernels of the first layer is 256, the size of a convolution window is 3 × 3, the step size is 1 × 1, the filling mode is SAME, then normalization is performed, the activation function is a relu function, the number of convolution kernels of the second layer is 256, the size of the convolution window is 3 × 3, the step size is 1 × 1, the filling mode is SAME, then normalization is performed, an image conv3_2 is obtained through the two convolution layers, then averaging operation is performed on the image conv3_2 and the image conv3_1 to obtain an image conv3_ mean, and therefore, after two cycles, image naming after several cycles is performed, the image naming of the cycle body structure is kept unchanged for convenience of naming, similar cycle body structures related below also follow the principle, and are not repeated. And then, the image conv3_ mean is up-sampled, the number of convolution kernels is 1, the size of a convolution window is 1 × 1, the sampling size is 4, and the step size is 1 × 1, so that a result output image output3 of a third module is obtained.

And meanwhile, convolving the images conv3_ mean, wherein the number of convolution kernels is 256, the size of a convolution window is 3 × 3, the step size is 2 × 2, the filling mode is SAME, an image conv3_3 is obtained, and then the image conv3_3 is added with the image rconv2_1 in the step 2) to obtain an image add3_ 1. And performing convolution on add3_1, wherein the number of convolution kernels is 512, the size of a convolution window is 1 × 1, the step size is 2 × 2, the filling mode is SAME, and then performing normalization to obtain an image rconv3_ 1.

4) Fourth edge detection module: the input image is the conv2_2 image in the step 2), and after one convolution, the number of convolution kernels is 256, the size of a convolution window is 1 × 1, the step size is 2 × 2, and the filling mode is SAME, so that the image conv4_1 is obtained. And then adding the image conv4_1 and the image conv3_3 in the step 3) to obtain an image add4_ 1. And then, performing convolution on the image add4_1, wherein the number of convolution kernels is 512, the size of a convolution window is 1 × 1, the step size is 1 × 1, the filling mode is SAME, and performing normalization processing to obtain an image conv4_ 2. Further processing the image add3_1 in step 3), performing one convolution, wherein the number of convolution kernels is 512, the size of a convolution window is 1 × 1, the step size is 1 × 1, the filling mode is SAME, and performing normalization processing to obtain an image conv4_ 3. The image conv4_3 goes through 3 cycles, where the structure in the cycle is: the image conv4_3 passes through a relu activation function and then passes through two convolution layers, the number of convolution kernels of the first layer is 256, the size of a convolution window is 1 x 1, the step size is 1 x 1, the filling mode is SAME, then normalization is carried out, the activation function is a relu function, the number of convolution kernels of the second layer is 512, the size of the convolution window is 3, the step size is 1 x 1, the filling mode is SAME, then normalization is carried out to obtain an image conv4_4, the image conv4_3 and the image conv4_4 are added to obtain an image add4_2, and then the image add4_2 and the image conv4_3 are subjected to averaging operation to obtain an image conv4_ mean. And upsampling the image conv4_ mean, wherein the number of convolution kernels is 1, the size of a convolution window is 1 x 1, the sampling size is 8, and the step size is 1 x 1, so that a result output image output4 of the fourth module is obtained.

And meanwhile, the image conv4_ mean is convolved, the number of convolution kernels is 512, the size of a convolution window is 3 × 3, the step size is 2 × 2, the filling mode is SAME, an image conv4_5 is obtained, and then the image conv4_5 and the image rconv3_1 in the step 3) are added to obtain an image add4_ 3. And performing convolution on add4_3, wherein the number of convolution kernels is 512, the size of a convolution window is 1 × 1, the step size is 1 × 1, the filling mode is SAME, and then performing normalization to obtain an image rconv4_ 1.

5) A fifth edge detection module: the input image is the conv4_1 image in the step 4), and after one convolution, the number of convolution kernels is 512, the size of a convolution window is 1 × 1, the step size is 2 × 2, and the filling mode is SAME, so that an image conv5_1 is obtained. Then, the image conv5_1 is added to the image conv4_5 in the step 4) to obtain an image add5_ 1. And then, performing convolution on the image add5_1, wherein the number of convolution kernels is 512, the size of a convolution window is 1 × 1, the step size is 1 × 1, the filling mode is SAME, and performing normalization processing to obtain an image conv5_ 2. Processing the image add4_3 in the step 4), and performing 3 cycles, wherein the structure in the cycle is as follows: the image add4_3 passes through a relu activation function and then passes through two convolution layers, the number of convolution kernels of the first layer is 256, the size of a convolution window is 1 × 1, the step size is 1 × 1, the filling mode is SAME, then normalization is performed, the activation function is a relu function, the number of convolution kernels of the second layer is 512, the size of the convolution window is 3 × 3, the step size is 1 × 1, the filling mode is SAME, then normalization is performed to obtain an image conv5_3, the image conv5_3 and the image add4_3 are added to obtain an image add5_2, and then the image add5_2 and the image conv5_2 are subjected to averaging operation to obtain an image conv5_ mean. The image conv5_ mean is up-sampled with the number of convolution kernels of 1, the size of the convolution window of 1 × 1, the sample size of 16, and the step size of 1 × 1, resulting in the result output map output5 of the fifth module.

Meanwhile, the image conv5_ mean is added to the image rconv4_1 of step 4) to obtain an image add5_ 3.

6) A sixth edge detection module: the input image is the add5_3 image in the step 5), and after one convolution, the number of convolution kernels is 256, the size of a convolution window is 1 × 1, the step size is 1 × 1, the filling mode is SAME, and the image conv6_1 is obtained after normalization processing. And then, the image conv5_ mean in the step 5) is convoluted, the number of convolution kernels is 256, the size of a convolution window is 1 × 1, the step size is 1 × 1, the filling mode is SAME, and normalization processing is carried out to obtain an image conv6_ 2. The image conv6_1 is processed again, and the loop is performed for 3 times, and the structure in the loop is as follows: the image conv6_1 passes through a relu activation function and then passes through two convolution layers, the number of convolution kernels of the first layer is 128, the size of a convolution window is 1 × 1, the step size is 1 × 1, the filling mode is SAME, then normalization is carried out, the activation function is a relu function, the number of convolution kernels of the second layer is 256, the size of the convolution window is 3 × 3, the step size is 1 × 1, the filling mode is SAME, then normalization is carried out to obtain an image conv6_3, the image conv6_3 and the image add5_2 in the step 5) are added to obtain an image add6_1, and then the image add6_1 and the image conv6_2 are subjected to average value calculation to obtain an image 6_ mean. And upsampling the image conv6_ mean, wherein the number of convolution kernels is 1, the size of a convolution window is 1 × 1, the sampling size is 16, and the step size is 1 × 1, so that a result output image output6 of the sixth module is obtained.

Splicing the images output1-output6 obtained in each step from the step 1) to the step 6) to obtain a one-dimensional array outputs _1, changing the array outputs _1 into a three-dimensional array outputs _3 by adopting a cont function, performing one-time convolution on the three-dimensional array outputs _3, wherein the number of convolution kernels is 1, the size of a convolution window is 1 × 1, the step size is 1 × 1, and an image conv7_1 is obtained, namely the output edge detection effect graph.

Preferably, the global learning rate of the supervised scotopic vision image edge detection model of the convolutional neural network is 0.0001, the iteration times are 15000 times, and the input image is an RGB image.

Preferably, the weights of the last 5 times of training of the convolutional neural network edge detection model are saved, and the weight of the last iteration is used by default, and is numbered as 14999. And verifying by using the test set image in the scotopic vision image data set and the image shot in the scotopic vision environment, and storing the image edge detection result as a picture.

Preferably, the up-sampling method used by the supervised scotopic vision image edge detection model of the convolutional neural network is transposed convolution.

The loss function of the supervised scotopic vision image edge detection model of the convolutional neural network is as follows:

1-β＝|Y ⁺ |/|Y ⁺ +Y ^- |

wherein, A ⁿ (.), n represents the dimensions of the output image of each module of the supervised image edge detection model, W represents the set of all parameters in the model, W represents the corresponding set of parameters,

beta represents the coefficient of each term in the loss function, Y ⁺ Representing non-edge data in the edge labeled image dataset, σ () representing the scale level of each weight, y _j Indicating whether the pixel is marked as an edge, X indicating the input image, Y ^- Representing edge data in the edge labeled image dataset.

The process of calculating the error between the edge detection image and the edge marking image of the same scene under the normal illumination condition comprises the following steps: acquiring an edge point number set of edge marking images of the same scene under a normal illumination condition; acquiring an edge point number set of an edge detection model image; calculating the error of the edge detection image and the edge marking image of the same scene under the normal illumination condition according to an error formula;

the error formula is:

wherein error represents the error between the edge detection image and the edge labeling image of the same scene under normal illumination condition, M _new Representing a set of edge points of the image after the edge detection model, M _o Representing the set of edge points of the edge labeled image of the same scene under normal illumination conditions, m representing the edge points of the image after passing through the edge detection model, | | | | calculation _Euclid Edge point arrival detection for representing mth labelThe Euclidean distance between the measured edge points.

As shown in fig. 3, the test set images in the scotopic vision image data set are subjected to a trained supervised image edge detection model to obtain the parameter values of ODS (fixed contour threshold), OIS (optimal threshold for each image), and AP (average accuracy).

As shown in fig. 4, the scotopic vision image captured by the camera is subjected to a trained supervised image edge detection model to obtain parameter values of PSNR (peak signal-to-noise ratio) and MSE (mean square error).

The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for supervising dark vision image edge detection based on a convolutional neural network is characterized by comprising the following steps: acquiring a scotopic vision image, inputting the scotopic vision image into a trained optimized supervised image edge detection model for performing scotopic vision image edge detection to obtain an edge detection result; the optimized supervised image edge detection model consists of six edge detection modules and a splicing module;

s2: inputting the images in the amplified training set into an optimized supervised image edge detection model to obtain an effect process diagram of six edge detection modules; the structure of the six edge detection modules is as follows:

the third module comprises a convolution layer and two same convolution structures; the number of convolution kernels of the convolution layer is 256, the size of a convolution window is 1 x 1, and the step size is 1 x 1; the convolution structure is: the number of convolution kernels is 256, the size of a convolution window is 3 x 3, the step size is 1 x 1, and the activation function is a relu function;

the sixth module comprises a convolution layer and three same convolution structures; the number of convolution kernels of the convolution layer is 256, the size of a convolution window is 1 x 1, and the step size is 1 x 1; the convolution structure is: a relu activation function, the number of convolution kernels of the first convolution layer is 128, the size of a convolution window is 1 x 1, the step size is 1 x 1, and the activation function is a relu function; the number of convolution kernels of the second convolution layer is 256, the size of a convolution window is 3 x 3, and the step size is 1 x 1;

s7: and outputting an edge detection result, and finishing the model training.

2. The method according to claim 1, wherein the step of acquiring the scotopic vision image data set comprises: extracting R, G, B channels of the images under normal illumination to obtain R1, G1 and B1 images; linearly changing the gray levels of the R1, G1 and B1 images to 0-47 to obtain R2, G2 and B2 images; recombining the R2, G2 and B2 images to obtain an image under a dark vision environment; and carrying out set processing on the recombined images to obtain a scotopic vision image data set.

3. The supervised scotopic vision image edge detection method based on the convolutional neural network as claimed in claim 1, wherein the stitching module is configured to stitch the edge detection effect process graphs obtained by the six edge detection modules; transforming the spliced image into a three-dimensional image by adopting a contit function, and performing convolution on the three-dimensional image; the number of convolution kernels to be convolved is 1, the size of the convolution window is 1 x 1, and the step size is 1 x 1.

4. The method according to claim 1, wherein the image in the scotopic vision image data set is subjected to enhancement processing, including random cropping, image flipping and rotation processing.

5. The supervised scotopic vision image edge detection method based on the convolutional neural network as claimed in claim 1, wherein the loss function of the optimized supervised image edge detection model is as follows:

wherein, A ⁿ (.), n represents the dimension of each module output image of the supervision image edge detection model, W represents the set of all parameters in the model, W represents the corresponding parameter set,

beta represents the coefficient of each term in the loss function, Y ⁺ Representing non-edge data in the edge labeled image dataset, σ () representing the scale level of each weight, y _j Indicating whether pixels of the edge-detected image are marked as edges, X indicating the input image, Y ^- Representing edge data in the edge labeled image dataset.

6. The supervised scotopic vision image edge detection method based on the convolutional neural network as claimed in claim 5, wherein the coefficients of the loss function are optimized, and the optimization formula of the coefficients of the loss function is as follows:

where λ represents the weight controlling the positive and negative samples.

7. The supervised scotopic vision image edge detection method based on the convolutional neural network as claimed in claim 5, wherein the weight λ controlling the positive sample and the negative sample is set, the initial value of λ is set to be between 0.6 and 1.2, and the initial value of λ is continuously updated through model training.

8. The method for supervised scotopic vision image edge detection based on the convolutional neural network as claimed in claim 1, wherein the process of calculating the error of the edge detection image and the edge label image of the same scene under the normal illumination condition is as follows: acquiring an edge point number set of edge marking images of the same scene under a normal illumination condition; acquiring an edge point number set of an edge detection model image; calculating the error of the edge detection image and the edge marking image of the same scene under the normal illumination condition according to an error formula;

the error formula is:

wherein error represents the error between the edge detection image and the edge labeling image of the same scene under the normal illumination condition, M _new Representing a set of edge points of the image after the edge detection model, M _o Representing the set of edge points of the edge labeled image of the same scene under normal illumination conditions, m representing the edge points of the image after passing through the edge detection model, | | | | calculation _Euclid Indicating the euclidean distance between the edge point of the mth label and the detected edge point.