CN113781510A

CN113781510A - Edge detection method and device and electronic equipment

Info

Publication number: CN113781510A
Application number: CN202111080227.8A
Authority: CN
Inventors: 孙科; 崔渊; 瞿翊
Original assignee: Shanghai Kingstar Fintech Co Ltd
Current assignee: Shanghai Kingstar Fintech Co Ltd
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2021-12-10

Abstract

The application discloses an edge detection method, an edge detection device and electronic equipment, wherein the method comprises the following steps: acquiring a target image, wherein the target image is an image acquired by acquiring a target object; inputting the target image into an edge detection model to obtain an edge image output by the edge detection model, wherein the edge image comprises a plurality of pixel point detection results, and the pixel point detection results represent whether corresponding pixel points are edge pixel points of the target object; the edge detection model is constructed based on a backbone network and at least comprises a plurality of feature layers, edge feature extraction is carried out on a target image according to different scale parameters on each feature layer to obtain edge feature information, and the edge feature information extracted on each feature layer is used for obtaining an edge image.

Description

Edge detection method and device and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an edge detection method and apparatus, and an electronic device.

Background

In a scene of intelligently analyzing a document image uploaded by photographing, a region where a document is located is usually found in the whole document image, that is, the document image is processed into an image effect similar to that of a scanning piece after operations such as cropping and zooming are performed by detecting an edge contour of the document. Therefore, the accuracy of edge detection for a document is required to be high.

In the current edge detection scheme, an algorithm based on deep learning is generally adopted, and a training sample containing an edge marker is used for training the edge detection model in the deep learning manner, so that the edge detection model can identify document edges in a document image.

But this solution still has the drawback of low accuracy.

Disclosure of Invention

In view of the above, the present application provides an edge detection method, an edge detection device and an electronic apparatus, so as to solve the defect of low accuracy in the current edge detection.

The application provides an edge detection method, which comprises the following steps:

acquiring a target image, wherein the target image is an image acquired by acquiring a target object;

inputting the target image into an edge detection model to obtain an edge image output by the edge detection model, wherein the edge image comprises a plurality of pixel point detection results, and the pixel point detection results represent whether corresponding pixel points are edge pixel points of the target object;

the edge detection model is constructed based on a backbone network and at least comprises a plurality of feature layers, edge feature extraction is carried out on the target image according to different scale parameters on each feature layer to obtain edge feature information, and the edge feature information extracted on each feature layer is used for obtaining the edge image.

In the above method, preferably, the edge detection model further includes a fusion layer, convolution layers respectively corresponding to each of the feature layers, and deconvolution layers respectively corresponding to each of the feature layers;

wherein the edge detection model outputs the edge image by:

respectively extracting edge features of the target image according to different scale parameters on each feature layer in the edge detection model to obtain edge feature images of different scales;

on the convolution layer corresponding to each feature layer, performing convolution on the edge feature image according to the convolution kernel size corresponding to the scale parameter to obtain a convolution feature image corresponding to each convolution layer;

on the deconvolution layer corresponding to each feature layer, according to the deconvolution parameters corresponding to the scale parameters, performing deconvolution on the convolution feature images respectively to obtain a size reduction image corresponding to each deconvolution layer;

on the fusion layer, fusing the size reduction images corresponding to each deconvolution layer according to an image channel to obtain a fusion characteristic image;

and performing convolution processing on the fusion characteristic image to obtain an edge image of a single image channel, wherein the edge image comprises a plurality of pixel point detection results, each pixel point detection result corresponds to each pixel point in the target image, the pixel point detection result takes the pixel point corresponding to the first pixel value representation as an edge pixel point of the target object, and the pixel point detection result takes the second pixel value representation pixel point as a non-edge pixel point.

In the method, preferably, in the edge detection model, at least one of the convolutional layers includes at least two special-shaped convolutional layers, and the special-shaped convolutional layers correspond to a special-shaped convolutional kernel size;

when the convolutional layers include at least two special-shaped convolutional layers, on the convolutional layer corresponding to each feature layer, the edge feature image is convolved according to the size of the convolution kernel corresponding to the scale parameter, so as to obtain a convolution feature image corresponding to each convolutional layer, including:

convolving the edge feature images corresponding to the corresponding feature layers according to the sizes of the special-shaped convolution kernels corresponding to the special-shaped convolution layers respectively to obtain special-shaped feature images corresponding to each special-shaped convolution layer;

wherein, when the convolution layer corresponding to the deconvolution layer includes at least two special-shaped convolution layers, on the deconvolution layer corresponding to each feature layer, according to the deconvolution parameter corresponding to the scale parameter, deconvolution is performed on the convolution feature image, so as to obtain a size reduction image corresponding to each deconvolution layer, the method includes:

deconvoluting the special-shaped characteristic images according to deconvolution parameters corresponding to the sizes of the special-shaped convolution kernels respectively to obtain special-shaped restored images corresponding to the special-shaped convolution layers;

and fusing the special-shaped restored images according to image channels to obtain the size restored images corresponding to the deconvolution layer.

In the above method, preferably, the edge detection model is obtained by training in the following manner:

obtaining a training sample, wherein the training sample comprises a training image and a standard image, the training image is an image obtained by collecting a training object, the standard image comprises a plurality of pixel point standard results, and the pixel point standard results represent whether corresponding pixel points are edge pixel points of the training object;

obtaining a loss function value of the edge detection model under model parameters by using the training image and the standard image; the model parameters at least comprise: a hierarchical weight parameter for the feature layer;

and adjusting the model parameters according to the loss function values so that the loss function values meet the model convergence condition.

Preferably, the method for obtaining the loss function value of the edge detection model under the model parameters by using the training image and the standard image includes:

obtaining a pixel error value between the training image and the standard image on each pixel point according to the model parameters in the edge detection model;

obtaining a probability error value between the training image and the standard image on each pixel point according to a model parameter in the edge detection model, wherein the probability error value is an error value of a prediction probability value of the edge pixel point of the training object predicted by each pixel point;

and obtaining a loss function value of the edge detection model under model parameters according to the pixel error value and/or the probability error value.

Preferably, the above method, obtaining a probability error value between the training image and the standard image at each pixel point according to a model parameter in the edge detection model, includes:

according to model parameters in the edge detection model, obtaining a first prediction probability value corresponding to each feature layer on each pixel point in the training image, wherein the first prediction probability value is the prediction probability value of the edge pixel point of the training object predicted by each pixel point on the corresponding feature layer;

for each pixel point, fusing the first prediction probability values on all the feature layers to obtain a second prediction probability value of each pixel point which is predicted as an edge feature point of the training object;

and obtaining a probability error value between the training image and the standard image on each pixel point according to the pixel point standard result in the standard image and the second prediction probability value.

In the above method, preferably, the model parameters further include: probability weight parameters of the feature layer;

for each pixel point, fusing the first prediction probability values on all the feature layers to obtain a second prediction probability value of each pixel point which is predicted as the edge feature point of the training object, including:

and for each pixel point, performing weighted summation on the first prediction probability values on all the feature layers by using the probability weight parameters of the feature layers to obtain a second prediction probability value of each pixel point which is predicted as the edge feature point of the training object.

In the above method, preferably, before performing edge feature extraction on the target image according to different scale parameters on each feature layer in the edge detection model to obtain edge feature images of different scales, the method further includes:

processing the target image into an image of a target size;

and the number of pixel point detection results contained in the edge image is matched with the target size.

The present application further provides an edge detection apparatus, the apparatus comprising:

the image acquisition unit is used for acquiring a target image, and the target image is an image acquired by acquiring a target object;

the image processing unit is used for inputting the target image into an edge detection model so as to obtain an edge image output by the edge detection model, wherein the edge image comprises a plurality of pixel point detection results, and the pixel point detection results represent whether corresponding pixel points are edge pixel points of the target object or not;

The present application further provides an electronic device, including:

a memory for storing an application program and data generated by the application program running;

a processor for executing the application program to implement the following functions: acquiring a target image, wherein the target image is an image acquired by acquiring a target object; inputting the target image into an edge detection model to obtain an edge image output by the edge detection model, wherein the edge image comprises a plurality of pixel point detection results, and the pixel point detection results represent whether corresponding pixel points are edge pixel points of the target object;

According to the technical scheme, the edge detection method, the edge detection device and the electronic equipment disclosed by the application have the advantages that the plurality of feature layers with different scale parameters are constructed in the edge detection model, so that the edge feature information on different scales is extracted from the edge detection model through the feature layers, and then the edge pixel points of the target object contained in the target image are detected by using the edge feature information on different scales. Therefore, in the application, different from the implementation scheme of detecting the edge pixel points of the target object by using the edge feature information of a single scale, the edge pixel points of the target object are detected by using the edge feature information of different scales, so that the accuracy of edge detection is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart of an edge detection method according to an embodiment of the present disclosure;

FIGS. 2 and 3 are diagrams illustrating an application example of the present application;

fig. 4 is a partial flowchart of an edge detection method according to an embodiment of the present application;

FIGS. 5-7 are diagrams of another exemplary application of the present application, respectively;

fig. 8 is another partial flowchart of an edge detection method according to an embodiment of the present application;

FIG. 9 is a diagram of another example application of the present application;

fig. 10 and fig. 11 are another partial flow charts of an edge detection method according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an edge detection apparatus according to a second embodiment of the present application;

fig. 13 is another schematic structural diagram of an edge detecting device according to a second embodiment of the present disclosure;

fig. 14 is a schematic structural diagram of an electronic device according to a third embodiment of the present application;

FIG. 15 is a diagram of an example of the present application as applied to computer implemented document image edge detection.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a flowchart of an implementation of an edge detection method provided in an embodiment of the present application is shown, where the method may be applied to an electronic device capable of performing data processing, such as a mobile phone, a pad, a notebook, a computer, or a server. The technical scheme in the embodiment is mainly used for improving the accuracy of edge detection.

Specifically, the method in this embodiment may include the following steps:

step 101: a target image is obtained.

The target image is an image obtained by collecting a target object, such as an RGB color image. For example, the target image may be an image obtained by capturing an image of the target object using an image capturing device such as a camera, and the target image may include a pixel region of the target object. For example, the target image is an image obtained by image acquisition of a desktop on which a document such as an identity card is placed; for another example, the target image is an image obtained by image-capturing a desktop on which a document such as a book or a file is placed, or the like.

It should be noted that the image capturing device may be a component configured in the electronic device, such as a camera built in a mobile phone; alternatively, the image capture device may be a separate device connected to the electronic device, such as a camera connected to a computer.

Step 102: and inputting the target image into the edge detection model to obtain an edge image output by the edge detection model.

The edge image comprises a plurality of pixel point detection results, and the pixel point detection results represent whether the corresponding pixel points are edge pixel points of the target object. That is to say, the edge image forms a matrix image by using pixel point detection results, each matrix element in the edge image is a pixel point detection result, the pixel point detection result corresponds to a pixel point in the target image, and based on this, the pixel point detection result represents whether the pixel point is an edge pixel point of the target object. Specifically, the pixel point detection result may adopt different pixel values to represent whether the corresponding pixel point is an edge pixel point of the target object.

For example, the pixel point detection result represents the corresponding pixel point as the edge pixel point of the target object by the first pixel value, and the pixel point detection result represents the pixel point as the non-edge pixel point by the second pixel value. For example, the edge image is represented by a binary image, and the pixel value of each pixel point in the edge image represents whether the pixel point corresponding to the pixel point in the target image is an edge pixel point of the target object. Specifically, the pixel points in the edge image represent the corresponding pixel points by the pixel value "1" as the edge pixel points of the target object, and the pixel points in the edge image represent the corresponding pixel points by the pixel value "0" as the non-edge pixel points.

It should be noted that the pixel values "1" and "0" in the above example can be understood as the probability that the pixel point is an edge pixel point of the target object, and are different from the pixel values in "0 to 255" in the practical sense.

In addition, an edge detection model for detecting edge pixel points of a target object in a target image is constructed based on a backbone network, and based on the backbone network, the edge detection model can be constructed by adopting a deep learning network of large-scale image classification, such as VGGNet, ResNet, inclusion, Xprediction and the like. In order to improve the edge detection accuracy and further improve the edge detection efficiency, the edge detection model can be constructed by adopting a lightweight deep learning network such as MobileNet and the like.

Based on this, the edge detection model provided in this embodiment at least includes a plurality of feature layers, and of course, includes a convolution layer and a deconvolution layer corresponding to each feature layer. And performing edge feature extraction on the target image according to different scale parameters on each feature layer to obtain edge feature information, wherein the edge feature information extracted on each feature layer is used for obtaining an edge image.

Specifically, after a target image is input into an edge detection model, edge feature extraction is performed on the target image on each feature layer in the edge detection model according to scale parameters corresponding to the feature layers to obtain edge feature information corresponding to each feature layer.

According to the technical scheme, the edge detection method provided by the embodiment of the application has the advantages that the plurality of feature layers with different scale parameters are constructed in the edge detection model, so that the edge feature information on different scales is extracted from the edge detection model through the feature layers, and the edge pixel points of the target object contained in the target image are detected by using the edge feature information on different scales. Therefore, in this embodiment, different from the implementation scheme of detecting the edge pixel of the target object by using the edge feature information of a single scale, the edge pixel of the target object is detected by using the edge feature information of different scales, so that the accuracy of edge detection is improved.

Further, in the present embodiment, after the edge image is obtained, the edge image may be output. For example, the pixel value of the pixel having the pixel detection result of 1 in the edge image is set to 255, and the pixel value of the pixel having the pixel detection result of 0 is set to 0, so that after the edge image is output to the electronic device, the user can see a black edge line, as shown in fig. 2, and the other areas are blank.

In addition, in this embodiment, a pixel point of which the pixel point detection result is 1 in the edge image may be further specially marked in a corresponding position in the target image, for example, a pixel value marked as a highlight is shown in fig. 3, so that when the target image is output, a user can visually view the edge detection result of the target image.

In addition, in this embodiment, image clipping may be performed according to the corresponding position of the pixel point in the edge image where the pixel point detection result is 1 in the target image, so as to obtain an image only including the target object region.

In a specific implementation, the edge detection model processes the target image based on the constructed structures such as the feature layer, the convolutional layer, the anti-convolutional layer, and the fusion layer to output an edge image, as shown in fig. 4:

step 401: the target image is processed into an image of a target size.

Specifically, in this embodiment, the target image is scaled according to a preset target size, so as to obtain the target image with the target size. For example, the edge detection model scales the input target image to a size of 256 × 256.

Step 402: and respectively carrying out edge feature extraction on the target image according to different scale parameters on each feature layer in the edge detection model so as to obtain edge feature images with different scales.

For example, as shown in fig. 5, the edge detection model includes at least 5 feature layers, such as Block0_1, Block1_0, Block2_1, Block3_2, Block5_2, and the like, where the corresponding scale parameter is different for each feature layer, for example, the scale parameter of Block0_1 is 1/1, that is, the size of an edge feature image output by the feature layer is consistent with that of a target image; the scale parameter of Block1_0 is 1/2, that is, the size of the edge feature image output by the feature layer is 1/2 of the target image, the scale parameter of Block2_1 is 1/4, namely, the size of the edge feature image output by the feature layer is 1/24 of the target image, the scale parameter of Block3_2 is 1/8, namely, the size of the edge feature image output by the feature layer is 1/8 of the target image, the scale parameter of Block5_2 is 1/16, that is, the size of the edge feature image output by the feature layer is 1/16 of the target image, based on which, as shown in fig. 6, the edge detection model performs edge feature extraction on 256 × 256 target images on Block0_1, Block1_0, Block2_1, Block3_2, and Block5_2, respectively, so as to obtain edge feature images with different sizes, where: 256x256 of the first edge feature image, 128x128 of the second edge feature image, 64x64 of the third edge feature image, 32x32 of the fourth edge feature image, and 16x16 of the fifth edge feature image.

Step 403: and on the convolution layer corresponding to each characteristic layer, performing convolution on the edge characteristic image according to the convolution kernel size corresponding to the scale parameter to obtain a convolution characteristic image corresponding to each convolution layer.

Wherein one convolutional layer corresponds to one feature layer and the different scale parameters correspond to different convolutional kernel sizes. For example, taking the feature layer shown in FIG. 5 as an example, the larger the scale parameter, the larger the corresponding convolution kernel size. Feature layer Block0_1 may correspond to a convolution kernel size of 5x5, feature layer Block2_1 may correspond to a convolution kernel size of 3x3, Block5_2 may correspond to a convolution kernel size of 1x1, and so on.

Based on the above, on each convolution layer in the edge detection model, the corresponding edge feature image on each feature layer is respectively subjected to convolution processing according to the corresponding convolution kernel size, so that the corresponding convolution feature image is obtained. For example, taking the feature layer shown in fig. 5 as an example, the edge detection model performs convolution processing on a 256 × 256 first edge feature image according to a convolution kernel size of 5 × 5 to obtain a first convolution feature image; performing convolution processing on the 128x128 second edge characteristic image according to the convolution kernel size of 3x3 to obtain a second convolution characteristic image; performing convolution processing on the 64x64 third edge feature image according to the convolution kernel size of 3x3 to obtain a third convolution feature image; performing convolution processing on the 32x32 fourth edge feature image according to the convolution kernel size of 1x1 to obtain a fourth convolution feature image; and performing convolution processing on the fifth edge feature image of 16x16 according to the convolution kernel size of 1x1 to obtain a fifth convolution feature image.

Step 404: and performing deconvolution on the deconvolution layer corresponding to each characteristic layer according to the deconvolution parameters corresponding to the scale parameters to obtain a size reduction image corresponding to each deconvolution layer.

One deconvolution layer corresponds to one feature layer, and because the scale parameters of the feature layers are different, the sizes of the corresponding obtained convolution feature images are also different, so that the deconvolution parameters corresponding to each feature layer, namely the deconvolution parameters on each deconvolution layer, correspond to the scale parameters, and the deconvolution parameters can contain the restored sizes for restoring the convolution feature images to the original sizes, such as the target sizes. Specifically, in this embodiment, the convolution feature images corresponding to the feature layers may be deconvolved respectively according to deconvolution parameters matched with the size parameters, so as to obtain size restored images corresponding to the deconvolution layers.

For example, taking the feature layer shown in fig. 5 as an example, the edge detection model performs deconvolution on a 256 × 256 first convolution feature image according to corresponding deconvolution parameters to obtain a corresponding 256 × 256 first-size restored image on the feature layer; deconvolving the 128x128 second convolution characteristic image according to the corresponding deconvolution parameters to obtain a corresponding 256x256 second size reduction image on the characteristic layer; deconvolving the 64x64 third convolved feature image according to the corresponding deconvolution parameters to obtain a corresponding 256x256 third size restored image on the feature layer; deconvolving the 32x32 fourth convolved feature image according to the corresponding deconvolution parameters to obtain a corresponding 256x256 fourth size restored image on the feature layer; and deconvolving the 16x16 fifth convolved feature image according to the corresponding deconvolution parameters to obtain a corresponding 256x256 fifth-size restored image on the feature layer.

Step 405: and on the fusion layer, fusing the size reduction images corresponding to each deconvolution layer according to an image channel to obtain a fusion characteristic image.

The corresponding size reduction image on each deconvolution layer contains corresponding edge characteristic information, and the edge characteristic information is obtained by performing edge characteristic extraction, convolution and deconvolution on the target image in the preceding text. Based on this, in this embodiment, the size reduced images corresponding to each deconvolution layer are fused based on the direction of the image channel, for example, the size reduced images output on each deconvolution layer are respectively fused according to three image channels in an RGB color image, so as to obtain fusion feature images on the three image channels. For example, the edge feature images on 5 feature layers of Block0_1, Block1_0, Block2_1, Block3_2 and Block5_2 are convolved and deconvoluted to obtain reduced-size images, which are respectively fused according to image channels of three RGB color images, so as to obtain fused feature images on the three image channels.

Step 406: and performing convolution processing on the fusion characteristic image to obtain an edge image of a single image channel.

In this embodiment, the fusion feature image may be convolved according to a convolution kernel size of 1 × 1, so as to obtain an edge image of a single image channel, and further, because the edge image is represented by a two-dimensional matrix, and each matrix element value in the two-dimensional matrix represents a probability value that a corresponding pixel point is an edge pixel point of the target object, the matrix element value may be quantized in this embodiment. Specifically, a preset threshold may be adopted to quantize the matrix element value to 1 or 0, which indicates that: the probability value of the edge pixel point with the pixel point as the target object is 100% or 0. Based on this: the edge image comprises a plurality of pixel point detection results, namely quantized matrix element values, each pixel point detection result corresponds to each pixel point in the target image, the pixel point detection result represents the corresponding pixel point with a first pixel value such as 1 as the edge pixel point of the target object, and the pixel point detection result represents the pixel point with a second pixel value such as 0 as the non-edge pixel point. At this time, since the target image is an image of a target size, the number of pixel point detection results included in the edge image is the same as the number of pixel points in the target image, that is, the target image matches the target size.

In addition, since the target image is usually a rectangle with a certain aspect ratio, in order to further improve the accuracy of edge detection, in addition to constructing feature layers with different scale parameters in the edge detection model, one or more of the feature layers may also be constructed into special-shaped convolutional layers, that is, at least one convolutional layer corresponding to a feature layer in the edge detection model includes at least two special-shaped convolutional layers, and each special-shaped convolutional layer corresponds to a special-shaped convolutional kernel size. As shown in fig. 6, the convolutional layers corresponding to the feature layer Block0_1 are two special-shaped convolutional layers, and the sizes of the convolutional cores are 1x5 and 5x1, respectively, for extracting edge feature information with different aspect ratios in different directions.

Based on the above implementation, when the convolutional layer includes at least two special-shaped convolutional layers, the obtaining of the convolution feature image corresponding to each convolutional layer in step 403 may specifically be implemented by:

and on each special-shaped convolution layer, performing convolution on the edge characteristic image corresponding to the corresponding characteristic layer according to the special-shaped convolution kernel size corresponding to the special-shaped convolution layer to obtain a special-shaped characteristic image corresponding to each special-shaped convolution layer.

And in the special-shaped convolution layer corresponding to the characteristic layer, the size of the special-shaped convolution kernel also corresponds to the size parameter of the characteristic layer. For example, as shown in fig. 6, the convolutional layers corresponding to the feature layer Block0_1 are two special-shaped convolutional layers, the sizes of the convolutional cores are 1x5 and 5x1, respectively, and the convolutional layers correspond to the scale parameters of 1/1; for another example, the convolutional layer corresponding to the feature layer Block1_0 is two special-shaped convolutional layers, the sizes of the convolutional cores are 1x3 and 3x1 respectively, and the convolutional layers correspond to the scale parameter 1/2; while the convolution layers corresponding to the other three feature layers Block2_1, Block3_2 and Block5_2 are conventional convolution layers, and the sizes of the respective special-shaped convolution cores correspond to the scale parameters of the corresponding feature layers, namely 3x3, 1x1 and 1x1, and in this case, the feature layers have 9 layers.

Under the condition that the convolution layer corresponding to the feature layer comprises a plurality of special-shaped convolution layers, a plurality of special-shaped feature images are correspondingly obtained on each feature layer, and the sizes of the special-shaped feature images are different from the target size, so that the deconvolution is realized by adopting corresponding deconvolution parameters during deconvolution. Therefore, in the case that the convolution layer corresponding to the deconvolution layer includes at least two irregular convolution layers, when obtaining the reduced-size image corresponding to each deconvolution layer, step 404 can be implemented by:

firstly, deconvolution is carried out on the special-shaped characteristic images according to deconvolution parameters corresponding to the special-shaped convolution kernel sizes respectively, so as to obtain special-shaped restored images corresponding to the special-shaped convolution layers. And then, fusing the special-shaped restored images according to the image channels to obtain the size restored images corresponding to the deconvolution layer.

The deconvolution layer parameters corresponding to the irregular convolution kernel size can contain deconvolution times which are related to the scale parameters besides the size to be restored, such as the target size. For example, in the case where the scale parameter is 1/1, the number of deconvolution times is 1; in the case where the scale parameter is 1/2, the number of deconvolution times is 2; in the case where the scale parameter is 1/4, the number of deconvolution times is 4; in the case where the scale parameter is 1/8, the number of deconvolution times is 8; in the case of the scale parameter of 1/16, the number of deconvolution times is 16.

On the basis, on the deconvolution layer, deconvolution processing is performed for the special-shaped feature image output by each special-shaped convolution layer for corresponding times according to the target size, and thus a special-shaped restored image which is corresponding to each special-shaped convolution layer and is matched with the target size is obtained. As shown in fig. 7, the convolution layer corresponding to the feature layer Block0_1 is two special-shaped convolution layers, after convolution processing is performed according to the sizes of respective convolution kernels, two special-shaped feature images are obtained, on the deconvolution layer, deconvolution is performed on the two special-shaped feature images according to the target size of 256 × 256 respectively to obtain two special-shaped restored images, and then the two special-shaped restored images are fused according to an image channel to obtain a size restored image corresponding to the deconvolution layer; for another example, the convolution layer corresponding to the feature layer Block1_0 is two special-shaped convolution layers, after convolution processing is performed according to the sizes of respective convolution kernels, two special-shaped feature images are obtained, on the deconvolution layer, after deconvolution is performed on the two special-shaped feature images for the first time according to the target size of 256x256, deconvolution processing is performed for the second time respectively, so that two 256x256 special-shaped restored images are obtained, and then the two special-shaped restored images are fused according to image channels, so that a size restored image corresponding to the deconvolution layer is obtained; and for the convolutional layers corresponding to Block2_1, Block3_2 and Block5_2, which are conventional convolutional layers, after convolution is performed according to the sizes of the respective convolutional cores and corresponding to the corresponding edge feature images, deconvolution is performed on the corresponding convolutional feature images according to corresponding deconvolution parameters, namely, the convolution parameters are respectively amplified by 4 times, 8 times and 16 times, so that a corresponding size reduction image on each deconvolution layer is obtained.

In one implementation, the edge detection model described above can be trained by:

step 801: training samples are obtained.

The training samples may be multiple groups, and in this embodiment, each group of training samples is used to train the edge detection model in sequence. Specifically, each set of training samples includes a frame of training image and a frame of standard image, the training image is an image obtained by collecting a training object, as shown in fig. 9, the standard image includes a plurality of pixel point standard results, and each pixel point standard result represents whether a corresponding pixel point is an edge pixel point of the training object. Specifically, the pixel point standard result in the standard image may be manually labeled, for example, an edge pixel point of the training object is manually labeled on the standard image corresponding to the training image, a value representing that the pixel point standard result is an accurate value representing the edge of the training object is represented by 1, and a value representing that the pixel point standard result is an accurate value representing the edge of the non-training object is represented by 0.

Step 802: and obtaining a loss function value of the edge detection model under the model parameters by using the training image and the standard image.

Wherein the model parameters at least comprise: a hierarchical weight parameter for the feature layer; the hierarchical weight parameter is understood as a weight parameter used by the edge detection model to extract the edge feature at each feature level. In addition, the model parameters also include other weight parameters, such as weight parameters of other computation layers in the backbone network in the edge detection model, and the like.

Specifically, in this embodiment, the loss function value may be represented by a pixel error value between the training image and the standard image at each pixel point. Further, in order to improve the training efficiency and the accuracy, in this embodiment, an error value of the prediction probability value that each pixel is predicted as an edge pixel may be added to the pixel error value, so as to expand the loss function value, thereby achieving the purpose of improving the training efficiency and the accuracy.

Step 803: and adjusting the model parameters according to the loss function values so that the loss function values meet the model convergence condition.

Wherein, the model convergence condition may be: the variation of the loss function values obtained in two adjacent times is smaller than the convergence threshold. For example, the loss function value approaches to 0 infinitely or approaches to a certain value infinitely, so as to indicate that the model parameters in the edge detection model are adjusted to be optimal, and the edge detection can be performed accurately and efficiently.

In one implementation, when obtaining the loss function value of the edge detection model under the model parameters by using the training image and the standard image in step 802, the following method may be implemented, as shown in fig. 10:

step 1001: and obtaining a pixel error value between the training image and the standard image on each pixel point according to the model parameters in the edge detection model.

Specifically, inputting a training image into an edge detection model, based on current model parameters, the edge detection model respectively extracts edge features of the training image on a plurality of feature layers with different scale parameters to obtain edge feature images with different scales, then respectively convolving the edge feature images on convolution layers according to convolution kernel sizes corresponding to the scale parameters to obtain convolution feature images corresponding to each convolution layer, then respectively deconvolving the convolution feature images on deconvolution layers according to deconvolution parameters corresponding to the scale parameters to obtain size reduction images corresponding to each deconvolution layer, and finally fusing the size reduction images corresponding to each deconvolution layer on a fusion layer according to image channels to obtain fusion feature images and carrying out convolution processing on the fusion feature images, to obtain an edge image of a single image channel. The edge detection model processes the training image on each layer by referring to the processing procedure of the edge detection model on the target image in the foregoing, and details are not described here.

Based on this, in this embodiment, the size reduction image corresponding to each feature layer of the training image may be processed with the standard image according to the pixel values of the included pixels, for example, the pixel values are subjected to processing such as subtraction, so as to obtain the pixel difference value between each pixel point on each feature layer between the training image and the standard image, and then the pixel difference values corresponding to all the corresponding feature layers are fused for each pixel point, for example, averaging is performed, so as to obtain the pixel error value on each pixel point. The pixel error value at each pixel point between the training image and the standard image is shown in equation (1) below:

wherein, F_bHierarchical weight parameter for a computation layer of a backbone network, F_sIs a hierarchical weight parameter in a multi-scale feature layer, and F_s＝(f¹，...，f^K) Where K is the number of multi-scale feature layers, such as the 9 feature layers shown in FIG. 7, λ_kThe corresponding coefficient for each feature layer.

Step 1002: and obtaining a probability error value between the training image and the standard image on each pixel point according to the model parameters in the edge detection model.

The probability error value is the error value of the predicted probability value of the edge pixel point of the training object predicted by each pixel point.

Specifically, in the embodiment, when obtaining the probability error value, the following may be implemented, as shown in fig. 11:

step 1101: and obtaining a first prediction probability value corresponding to each feature layer on each pixel point in the training image according to the model parameters in the edge detection model.

The first prediction probability value is the prediction probability value of the edge pixel point of each pixel point which is predicted as the training object on the corresponding feature layer.

For example, in this embodiment, a sigmoid function may be used to quantize the output value of the activation function a of each pixel point in the scale reduction image corresponding to each feature layer of the training image into a probability value, that is, a first prediction probability value. The first predicted probability value may be as shown in equation (2) below:

wherein γ is Y₁Y and 1-Y ═ Y₀/Y，Y₁And Y₀Respectively representing the labeled sets of edge pixels and non-edge pixels in the standard image. P (y)_i＝1|X)＝sigmoid(A_n) The sigmoid function is used for quantizing the output value of the activation function A of the pixel position n corresponding to the image into probability, so that the probability that the pixel point is marginal or non-marginal is obtained.

Step 1102: and aiming at each pixel point, fusing the first prediction probability values on all the feature layers to obtain a second prediction probability value of the edge feature point of each pixel point predicted as the training object.

In this embodiment, for each pixel point, the first prediction probability values on all corresponding feature layers may be averaged to obtain a second prediction probability that each pixel point is predicted as an edge feature point of the training objectA value; alternatively, in order to improve the accuracy, in this embodiment, for each pixel point, a probability weight parameter of each feature layer in the model parameters, such as a probability weight value c corresponding to each of k feature layers, may be used for each pixel point in the model parameters₁，...，c_kAnd carrying out weighted summation on the first prediction probability values on all the feature layers to obtain a second prediction probability value of the edge feature point of each pixel point which is predicted as the training object. For example, the second predicted probability value is obtained using the distance function in the following formula (3):

wherein the content of the first and second substances,

namely, the output value of the activation function A of the pixel position n corresponding to the training image is according to the weight parameter c corresponding to different feature layers_kAnd weighting, and quantizing the weighted values into probability values, namely second predicted probability values, by a sigmoid function. Therein

A finger distance function.

The probability weight parameters of the feature layers are preset values, and the magnitude of the weight values in the probability weight parameters represents the influence degree of the scale parameters corresponding to the corresponding feature layers on probability prediction, and is different from the level weight parameters of the feature layers.

Step 1103: and obtaining a probability error value between the training image and the standard image on each pixel point according to the pixel point standard result in the standard image and the second prediction probability value.

For example, 1 in the pixel point standard result is regarded as a probability value of 100%, 0 in the pixel point standard result is regarded as a probability value of 0, based on which, a difference value is calculated between each pixel point standard result in the standard image and a second prediction probability value corresponding to the corresponding pixel point, specifically, a distance function such as diff can be used for calculation, so as to obtain a probability error value between the training image and the standard image on each pixel point.

Step 1003: and obtaining a loss function value of the edge detection model under the model parameters according to the pixel error value and/or the probability error value.

It should be noted that, in this embodiment, a pixel error value on each pixel point between the training image and the standard image may be directly used as a loss function value, so as to adjust the model parameter. In order to further improve the training accuracy, in this embodiment, the pixel error value may be expanded, and the probability error value is added to the structure of the loss function value, so as to train the model parameter by combining more factors, thereby achieving the purpose of improving the training accuracy.

In a specific implementation, the loss function value is expressed as L_mul+L_fuseIs shown, in which:

is the pixel error value;

is the probability error value.

Wherein Y is a set of standard results of pixel points in the standard image,

and predicting each pixel point corresponding to the training image into a set of second prediction probability values of the edge feature points of the training object.

Based on the above implementation, in this embodiment, after the loss function value is obtained, the model parameters of the edge detection model are adjusted to implement model training. Specifically, a random gradient descent method can be used for training the model parameters in the model training.

Further, after the training of the edge detection model is completed, the edge detection model may be tested, for example, a test image is input into the edge detection model to obtain a result output by the edge detection model, the result is compared with a standard image corresponding to the test image to obtain a loss function value, and whether the loss function value satisfies a model convergence condition is determined. For example, in a testing stage, an image including a document is input into an edge detection model to obtain output results corresponding to feature layers of different scales, the results are fused according to trained weighting parameters, and then quantized into a binary image according to a threshold value to serve as a final prediction result, then a loss function value between the prediction result and a standard image is obtained, and whether the edge detection model passes the test is determined by judging whether the loss function value meets a model convergence condition.

In addition, in order to enable the edge detection model trained by the present application to occupy lower computing resources in an electronic device with weaker computing capability, in this embodiment, processing such as model channel pruning may be performed on the edge detection model, so as to reduce the occupied amount of resources in the electronic device, such as memory, and accelerate the operating speed of the electronic device.

Referring to fig. 12, a schematic structural diagram of an edge detection apparatus according to the second embodiment of the present disclosure is provided, where the apparatus may be configured in an electronic device capable of performing data processing, such as a mobile phone, a pad, a notebook, a computer, or a server. The technical scheme in the embodiment is mainly used for improving the accuracy of edge detection.

Specifically, the apparatus in this embodiment may include the following units:

an image obtaining unit 1201, configured to obtain a target image, where the target image is an image obtained by collecting a target object;

an image processing unit 1202, configured to input the target image into an edge detection model to obtain an edge image output by the edge detection model, where the edge image includes a plurality of pixel point detection results, and the pixel point detection results represent whether corresponding pixel points are edge pixel points of the target object;

It can be seen from the foregoing technical solutions that, in the edge detection device provided in the second embodiment of the present application, a plurality of feature layers with different scale parameters are built in an edge detection model, and then edge feature information on different scales is extracted from the edge detection model through the feature layers, and then edge pixel points of a target object included in a target image are detected by using the edge feature information on different scales. Therefore, in the application, different from the implementation scheme of detecting the edge pixel points of the target object by using the edge feature information of a single scale, the edge pixel points of the target object are detected by using the edge feature information of different scales, so that the accuracy of edge detection is improved.

In one implementation, the apparatus in this embodiment may further include the following units, as shown in fig. 13:

a model building unit 1203, configured to build an edge detection model, where the built edge detection model includes, in addition to a plurality of feature layers with different scale parameters, a fusion layer, convolution layers corresponding to each feature layer, and deconvolution layers corresponding to each feature layer;

the edge detection model outputs an edge image in the following way:

respectively extracting edge features of the target image according to different scale parameters on each feature layer in the edge detection model to obtain edge feature images of different scales; on the convolution layer corresponding to each characteristic layer, performing convolution on the edge characteristic image according to the convolution kernel size corresponding to the scale parameter to obtain a convolution characteristic image corresponding to each convolution layer; on the deconvolution layer corresponding to each characteristic layer, respectively deconvolving the convolution characteristic images according to the deconvolution parameters corresponding to the scale parameters to obtain a size reduction image corresponding to each deconvolution layer; on the fusion layer, fusing the size reduction images corresponding to each deconvolution layer according to an image channel to obtain a fusion characteristic image; and performing convolution processing on the fusion characteristic image to obtain an edge image of a single image channel, wherein the edge image comprises a plurality of pixel point detection results, each pixel point detection result corresponds to each pixel point in the target image, the pixel point detection result takes the pixel point corresponding to the first pixel value representation as an edge pixel point of the target object, and the pixel point detection result takes the second pixel value representation pixel point as a non-edge pixel point.

In one implementation, in the edge detection model, at least one convolution layer comprises at least two special-shaped convolution layers, and the special-shaped convolution layers correspond to the special-shaped convolution kernel size;

under the condition that the convolutional layer comprises at least two special-shaped convolutional layers, the edge detection model respectively convolves the edge characteristic images on the convolutional layer corresponding to each characteristic layer according to the size of a convolution kernel corresponding to the scale parameter so as to obtain a convolution characteristic image corresponding to each convolutional layer, and the method comprises the following steps of: convolving the edge characteristic images corresponding to the corresponding characteristic layers according to the sizes of the special-shaped convolution kernels corresponding to the special-shaped convolution layers respectively to obtain special-shaped characteristic images corresponding to each special-shaped convolution layer;

under the condition that the convolution layer corresponding to the deconvolution layer comprises at least two special-shaped convolution layers, the edge detection model respectively performs deconvolution on the convolution layer corresponding to each characteristic layer according to the deconvolution parameters corresponding to the scale parameters to obtain the size reduction image corresponding to each deconvolution layer, and the method comprises the following steps: deconvoluting the special-shaped characteristic images according to deconvolution parameters corresponding to the special-shaped convolution kernel sizes to obtain special-shaped restored images corresponding to the special-shaped convolution layers; and fusing the special-shaped restored images according to the image channels to obtain the size restored image corresponding to the deconvolution layer.

In one implementation manner, the present embodiment may further include the following units, as shown in fig. 13:

a model training unit 1204, configured to train an edge detection model, where the edge detection model is obtained by training in the following manner:

obtaining a training sample, wherein the training sample comprises a training image and a standard image, the training image is an image obtained by collecting a training object, the standard image comprises a plurality of pixel point standard results, and the pixel point standard results represent whether corresponding pixel points are edge pixel points of the training object; obtaining a loss function value of the edge detection model under the model parameters by using the training image and the standard image; the model parameters at least comprise: a hierarchical weight parameter for the feature layer; and adjusting the model parameters according to the loss function values so that the loss function values meet the model convergence condition.

Optionally, when the model training unit 1204 uses the training image and the standard image to obtain the loss function value of the edge detection model under the model parameter, it is specifically configured to: obtaining a pixel error value between the training image and the standard image on each pixel point according to the model parameters in the edge detection model; obtaining a probability error value between the training image and the standard image on each pixel point according to model parameters in the edge detection model, wherein the probability error value is an error value of a prediction probability value of the edge pixel point of the training object predicted by each pixel point; and obtaining a loss function value of the edge detection model under the model parameters according to the pixel error value and/or the probability error value.

Optionally, when the model training unit 1204 obtains a probability error value between the training image and the standard image at each pixel point according to the model parameters in the edge detection model, it is specifically configured to: according to model parameters in the edge detection model, obtaining a first prediction probability value corresponding to each feature layer on each pixel point in the training image, wherein the first prediction probability value is the prediction probability value of the edge pixel point of each pixel point predicted as a training object on the corresponding feature layer; for each pixel point, fusing the first prediction probability values on all the feature layers to obtain a second prediction probability value of the edge feature point of each pixel point predicted as a training object; and obtaining a probability error value between the training image and the standard image on each pixel point according to the pixel point standard result in the standard image and the second prediction probability value.

Further, the model parameters further include: probability weight parameters of the feature layer;

the model training unit 1204 is configured to, for each pixel, fuse the first prediction probability values on all the feature layers to obtain a second prediction probability value of the edge feature point of the training object, where each pixel is predicted as: and for each pixel point, performing weighted summation on the first prediction probability values on all the feature layers by using the probability weight parameters of the feature layers to obtain a second prediction probability value of the edge feature point of each pixel point predicted as a training object.

In one implementation, before obtaining the edge feature images of different scales, the edge detection model is further configured to: processing the target image into an image of a target size; and the number of pixel point detection results contained in the edge image is matched with the target size.

It should be noted that, for the specific implementation of each unit in the present embodiment, reference may be made to the corresponding content in the foregoing, and details are not described here.

Referring to fig. 14, a schematic structural diagram of an electronic device according to a third embodiment of the present disclosure is provided, where the electronic device may be an electronic device capable of performing data processing, such as a mobile phone, a pad, a notebook, a computer, or a server. The technical scheme in the embodiment is mainly used for improving the accuracy of edge detection.

Specifically, the electronic device in this embodiment may include the following structure:

a memory 1401 for storing an application program and data generated by the execution of the application program;

a processor 1402 for executing an application to implement the following functions: acquiring a target image, wherein the target image is an image acquired by acquiring a target object; inputting the target image into an edge detection model to obtain an edge image output by the edge detection model, wherein the edge image comprises a plurality of pixel point detection results, and the pixel point detection results represent whether corresponding pixel points are edge pixel points of the target object;

the edge detection model is constructed based on a backbone network and at least comprises a plurality of feature layers, edge feature extraction is carried out on a target image according to different scale parameters on each feature layer to obtain edge feature information, and the edge feature information extracted on each feature layer is used for obtaining an edge image.

According to the technical scheme, in the electronic device provided by the third embodiment of the application, the plurality of feature layers with different scale parameters are constructed in the edge detection model, so that edge feature information on different scales is extracted from the edge detection model through the feature layers, and edge pixel points of a target object contained in a target image are detected by using the edge feature information on different scales. Therefore, in the application, different from the implementation scheme of detecting the edge pixel points of the target object by using the edge feature information of a single scale, the edge pixel points of the target object are detected by using the edge feature information of different scales, so that the accuracy of edge detection is improved.

Taking an electronic device as a computer which needs to perform edge detection on a document image as an example, in the field of document image preprocessing, the technical scheme provided by the application aims to improve the precision and speed of a document edge contour extraction algorithm in the document image preprocessing process. According to the technical scheme, for the input document image, the document outline is drawn from multiple scales and dimensions, multi-level fusion is carried out on detection results of all scales on the basis of a channel layer, and the results are finally fused into the edge outline of the document, namely the edge image in the front. The technical scheme of the application can provide accurate prior information for subsequent calibration operations such as document cutting, zooming and the like, has better robustness and running speed, and is obviously improved in comprehensive performance compared with other popular algorithms for document edge detection.

First, the inventors of the present application discovered, in the course of studying the current document image edge detection scheme:

edge detection algorithms are classified into conventional image processing algorithms and algorithms based on deep learning. The main idea of the traditional image processing algorithm is to perform filtering operation on an image based on a filter which is designed manually and is sensitive to edges, filter out non-edge pixels and keep pixel points on edge outlines in the image. Representative algorithms are Sobel, Canny and Marr-Hildreth, among others. The traditional method has a good effect on a simple scene image, and can basically extract a complete document outline, however, a real scene image is more complicated than the simple scene image and has various interference factors, and an edge detection result obtained by the traditional method is more messy than that obtained in a simple scene, for example, a plurality of line segments with various lengths can be detected, or the edge line of the document is cut into a plurality of short line segments, and gaps with different distances exist among the line segments; when contour fitting is carried out according to the edge lines subsequently, many contours of non-document areas are found, and the performance of the algorithm is seriously interfered.

The emerging deep learning end-to-end based edge detection algorithm improves the performance of the traditional algorithm to a certain extent, has better robustness on complex scenes, but because the size of the deep learning model is larger and the running speed is slow, the prediction delay of a single image cannot meet the actual scene requirement. Because the document preprocessing is only a small part of the whole document processing algorithm, in an actual scene, the precision and the speed of the document processing algorithm are higher; in addition, most preprocessing algorithms run in the mobile terminal APP, and the computing power that the mobile terminal device can bear cannot meet the requirements of the deep learning model on the computing power. Therefore, the exploration of the document edge detection algorithm with high speed, high precision and small model size is a relatively urgent problem to be solved in the industry.

In view of this, the inventor of the present application proposes a scheme for solving the problem that the conventional image processing method is poor in accuracy and robustness: the method comprises the following steps that a pre-training model based on an advanced million-level image classification task is used as a basic backbone network, 5 convolutional layers with different scales are designed and selected from all networks in a constructed edge detection model to serve as basic edge feature layers, and a convolutional layer with different convolutional kernel sizes is additionally covered on the basic edge feature layers to be used for adapting to an edge structure corresponding to a document; then fusing the features of different scales, and filtering the fused result by using a set threshold value to form a final document edge detection result; further, the inventor of the present application considers that a MobileNetV3 network architecture which is friendly to the computational power of a mobile terminal is used as a basis for the problem of low running speed based on a deep learning model, and compresses an actual model weight file to only 1.5M by using a model compression pruning and quantization technology, so that the computational power cost is obviously reduced compared with an algorithm depending on a server-side GPU.

With reference to the model structure shown in fig. 15, a specific embodiment of the edge detection model is described as follows:

1. network structure of the model:

first, the following DocEdgeNet is an edge detection model in the present application, i.e., a document edge detection algorithm. As shown in fig. 15, DocEdgeNet is a multi-scale, multi-converged network structure. For each input RGB color image, the algorithm first scales its size to 256x256 size, and then designs 5 different scale feature layers to delineate the edge features of the document outline from different scales. The feature map sizes of the 5 feature layers are 1/1, 1/2, 1/4, 1/8 and 1/16, respectively, the features of different layers are extracted and fused based on the image channel directions, and finally a final feature map (i.e., the edge image in the foregoing) with the same size as the input image (i.e., the target image in the foregoing) is generated through a 1x1 convolution operation (i.e., the convolution core with the size of 1x1 is used for performing convolution on the feature map) to represent the final result. The output of the algorithm model is a binarized image (i.e., the edge image in the foregoing), the pixel value representing the edge is 1, and the rest is 0 (representing the background other than the edge).

2. Constructing a feature extraction layer:

the feature layer is constructed based on a specific backbone network (backbone). Popular large-scale image classification networks such as VGGNet, ResNet, inclusion, Xprediction and the like can be selected, the networks are high in precision, large in model scale and low in reasoning speed; in contrast, a lightweight network designed based on mobile terminal computing power, such as the MobileNet series, has a small model scale, has a slight loss in accuracy compared with a large model, but has an obvious increase in inference speed, and is more suitable for scenes with high real-time requirements.

In the present application, feature layers with different scales are designed to perform drawing and processing on document edge features, as shown in fig. 15, for convenience of understanding, only layers used for feature extraction are drawn in fig. 15, and other layers and partial convolution layers used only for calculation are omitted. Wherein:

block0_1 bottleneck layer (bottleecklayer) of the same size as the input image. The primary purpose of this layer is to capture the edges of document regions in an image with a coarser granularity, avoiding losing feature information at larger scale levels. In addition, considering that most documents are rectangular with a certain aspect ratio, two special-shaped convolution layers are branched after the Block0_1 layer in the application, and the sizes of the convolution kernels used are 1x5 and 5x1 respectively, so as to depict edges with different aspect ratios in different directions.

Similarly, a similar operation is performed on the subsequent Block1_0 layer, but since the size of the layer is halved compared with the input image, the convolution operations are performed by using the irregular convolution kernels with smaller sizes, i.e., 1x3 and 3x1 convolution respectively.

The sizes of Block2_1, Block3_2 and Block5_2 are 1/4, 1/8 and 1/16 of the original input image size, respectively, and conventional shapes such as 3x3 and 1x1 are adopted in the size design of the convolution kernel in consideration of the feature size being sufficiently small at this time.

3. Feature multiscale deconvolution and feature fusion:

after the input image is subjected to multi-scale feature extraction operation, feature maps with different sizes can be obtained, and represent the document edge features on different scales. Since the output of the network needs to be consistent with the size and the size of the input, the feature maps of different scales need to be up-sampled for scale expansion. Considering that the weights in the original backbone network aim at image classification and do not aim at edge contour features for special training, the scale expansion strategy adopted by the method is deconvolution (deconvolution), so that parameters of each layer of deconvolution are learnable. The deconvolution is a special forward convolution, which enlarges the size of the feature map by supplementing a plurality of blank feature points (for example, 0) according to a certain proportion, rotates the convolution kernel, and then performs the operation of forward convolution.

For two obtained special-shaped convolution branches of the Block0_1 layer, the two obtained special-shaped convolution branches are firstly reduced to the size same as that of the Block0_1 through a deconvolution operation, and then are fused with the Block0_1 layer from the channel dimension (depth-wise localization);

similarly, the two branches of Block1_0 are reduced to the same size as Block1_0 in the same manner. Further, since Block1_0 has a size of 1/2 of the input image, it is necessary to perform a second deconvolution and enlarge it by 2 times. Finally, fusing the channel dimensions of the Block1_0 and the corresponding reduced branches;

similarly, for Block2_1, Block3_2, and Block5_2, their sizes are reduced to the input image size using deconvolution, and finally all results are channel-dimension fused.

It is noted that the sizes of the convolution kernels used for deconvolution of feature maps of different sizes are different, and the larger the magnification factor is required, the larger the size of the convolution kernel used. In effect, it is equivalent to enlarge the 3 layers by 4, 8 and 16 times, so that the feature maps obtained at each scale are the same size.

The last step of the DocEdgeNet is to perform convolution operation of 1 × 1 on each fused feature map, which is equivalent to convolution based on the channel direction (depth-wise), and the result is a feature map with the same size as the input image size but with a channel of 1. The feature map can be regarded as a two-dimensional matrix, and values in the matrix are probabilities used for indicating whether the corresponding pixel is an edge region of the document. The transformation of the feature map into the actual prediction result only requires quantizing the probability values to 0 and 1 according to a threshold (e.g., rounding). The pixel with value 1 represents the edge, and the rest is the irrelevant background.

4. Model training and testing:

given an input image X and a corresponding groudtruth (true value) image Y (where Y is a binarized black-and-white image, the pixel values of the edge in the image are 1, and the others are 0), the training model aims to achieve the purpose of making the document edge predicted by the algorithm model as consistent as possible with the actual groudtruth result with the edge marked.

By F_bRepresenting weight parameters in layers of the backbone network, denoted F_sRepresenting weight parameters in a multi-scale feature layer, and F_s＝(f¹，...，f^K) Where K is the number of multi-scale feature layers (K is 9 in this application), the objective function L in training the model can be expressed as shown in formula (1), where L_mulThe loss function refers to the pixel-level error between the feature map and the groudtruth, i.e. the input image X is calculated as (X)_nN ═ 1., | X |) and a groudruth image Y ═ Y · (Y ·_n，i＝1，...，|Y|)，y_iE {0, 1} error value over pixel.

However, the pixels in the image with the edge contour occupying more than 10% of the pixels in the whole image in most cases cause a serious problem of category imbalance, and directly affect the actual effect of the model. Therefore, the present application considers the use of weighted cross entropy to extend the pixel level error to greatly mitigate the performance degradation problem caused by class imbalance.

Based on the above idea, a class weight parameter γ is introduced to control the importance of edge pixels and non-edge pixels in calculating the loss function. Thus, will L_mul(F_b，F_s) The extension is as shown in equation (2). Wherein γ is Y₁Y and 1-Y ═ Y₀/Y，Y₁And Y₀Respectively representing labeled sets of edge pixels and non-edge pixels in the groudtruth data. P (y)_i＝1|X)＝sigmoid(A_n) The sigmoid function is used for quantizing the output value of the activation function A of the pixel position n corresponding to the image into probability, so that the probability that the pixel point is marginal or non-marginal is obtained. Performing such calculation for all feature layers with different sizes results in the prediction probability of the document edge in the input image based on the features with different scales.

Finally, the prediction results of these different feature layers need to be fused, for which a new trainable weighting parameter c ═ is introduced (c)₁，...，c_k) To express the proportion of each feature layer when fusing the recognition results. As shown in equation (3). Wherein the content of the first and second substances,

namely, the output value of the activation function A of the pixel position n corresponding to the image is according to the weight parameter c corresponding to different characteristic layers_kWeighting is carried out, and then the weighted value is quantized into a probability value by a sigmoid function.

The distance function is used for measuring the difference degree between the predicted value and the GroudTruth, common Euclidean distance or cross entropy and the like can be used, and common public mathematical formulas are not listed one by one.

In summary, the loss function during model training is configured as L_mul+L_fuseThe parameters in the model are trained using a conventional stochastic gradient descent method when training the strategy. In the testing stage, an image containing a document is input into the model, output results of feature layers with different scales are obtained, the results are fused according to trained weights, and then the result is quantized into a binary image according to a threshold value to serve as a final prediction result.

Further, in the present application, an edge detection model, i.e., a DocEdgeNet, is optimized as follows:

considering that the document edge detection is only one link in the document image preprocessing and has high requirement on speed, a series of optimization is required after the model structure and the algorithm design are complete, and the comprehensive performance can meet the requirement of a service scene.

Firstly, in the aspect of backbone network selection, the application is modified on the basis of a lightweight network MobileNet V3. For example, the remaining bottleneck layers are pruned starting from Block1_0 to Block5_ 2; in addition, two additional bottleneck layers, namely Block0_0 and Block0_1, are added before Block1_0, and are consistent with the size of an input image and used for describing document edge characteristics at a coarse-grained level.

Secondly, if the loss function is calculated on the feature map of each scale, the time consumption is high, so that the loss function can be adjusted to calculate the loss function only by using the fused feature image, and the calculation amount is reduced.

And thirdly, the size of the trained DocEdgeNet network is about 40M, and the higher operation speed cannot be achieved, so that the trained model is subjected to channel pruning operation. Specifically, a threshold range is set for each feature layer, and the number of convolution kernels contained in the layer is reduced in the range, so that the operation complexity of the network is reduced, and the efficiency is improved. The setting of the threshold range is influenced by a data set and an application scene, and has no deterministic standard, for this reason, 15% -20% of data are drawn in a training set to serve as a verification set, the change of prediction precision and speed is verified on the verification set every time pruning operation is carried out, exhaustive search is carried out according to the greedy algorithm, the optimal convolution kernel quantity of each feature layer is found, and balance is obtained in speed and precision. At this point, the model size is reduced to 3M by a large margin.

In order to further accelerate the speed, the application uses the TensorRT technology to carry out reasoning acceleration on the edge detection model, deletes operators and modules which are not used in the model, converts all parameters from variables into constants, converts the precision from floating point numbers into integer, and finally obtains the reasoning model with the size of only 1.5M. This model is used for reasoning only.

The foregoing is directed to the technical details of the present application. Finally, summarizing the whole treatment process: given an input RGB color image, the size of the input RGB color image is firstly scaled to 256x256, then the input RGB color image is sent to DocEdgeNet for multi-scale feature extraction and fusion, and a fused result is converted into a binary image to be output. The pixel value of the image containing only the document edge is white, and the rest pixels are black.

To sum up:

(1) the application improves on the basis of a lightweight network MobileNet V3. First, the bottleneck layer is reserved starting from Block1_0 to Block5_2, and the rest of the layers are pruned; in addition, two additional bottleneck layers, namely Block0_0 and Block0_1, are added before Block1_0, and are consistent with the size of an input image and used for describing document edge characteristics at a coarse-grained level.

(2) Considering that most documents are rectangles with certain aspect ratios, traditional symmetric convolution kernels such as 3x3 and 5x5 cannot achieve very good effects when describing edge features, so that the application innovatively branches off two special-shaped convolution layers after the Block0_1 layer, and the sizes of the used convolution kernels are 1x5 and 5x1 respectively, and the special-shaped convolution kernels are used for describing edges with different aspect ratios in different directions. Similarly, a similar operation is performed on the subsequent Block1_0 layer, but since the size of the layer is halved compared with the input image, the convolution operations are performed by using the irregular convolution kernels with smaller sizes, i.e., 1x3 and 3x1 convolution respectively.

(3) Aiming at a specific service scene, a series of strategies such as model channel pruning, inference model conversion, parameter quantification and the like are designed, a good effect is achieved on the size of a model and the inference speed, and compared with a large-scale deep learning edge detection model, the method achieves obvious speed improvement at the cost of slight performance loss; compared with the traditional edge detection algorithm based on image processing, the algorithm model provided by the application has good robustness and obviously improved performance, and can obtain the inference speed close to real time on different devices.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An edge detection method, characterized in that the method comprises:

2. The method of claim 1, wherein the edge detection model further comprises a fusion layer, convolution layers corresponding to each of the feature layers, respectively, and deconvolution layers corresponding to each of the feature layers, respectively;

wherein the edge detection model outputs the edge image by:

3. The method of claim 2, wherein at least one of the convolutional layers in the edge detection model comprises at least two specially shaped convolutional layers, the specially shaped convolutional layers corresponding to a specially shaped convolutional kernel size;

4. The method of claim 1, 2 or 3, wherein the edge detection model is trained by:

5. The method of claim 4, wherein obtaining the loss function values of the edge detection model under model parameters using the training images and standard images comprises:

6. The method of claim 5, wherein obtaining a probability error value between the training image and the standard image at each pixel point according to the model parameters in the edge detection model comprises:

7. The method of claim 6, wherein the model parameters further comprise: probability weight parameters of the feature layer;

8. The method according to claim 2, wherein before performing edge feature extraction on the target image according to different scale parameters on each feature layer in the edge detection model to obtain edge feature images of different scales, the method further comprises:

processing the target image into an image of a target size;

9. An edge detection apparatus, characterized in that the apparatus comprises:

10. An electronic device, comprising: