CN112150493B - Semantic guidance-based screen area detection method in natural scene - Google Patents

Semantic guidance-based screen area detection method in natural scene Download PDF

Info

Publication number
CN112150493B
CN112150493B CN202011004389.9A CN202011004389A CN112150493B CN 112150493 B CN112150493 B CN 112150493B CN 202011004389 A CN202011004389 A CN 202011004389A CN 112150493 B CN112150493 B CN 112150493B
Authority
CN
China
Prior art keywords
image
screen
edge
module
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011004389.9A
Other languages
Chinese (zh)
Other versions
CN112150493A (en
Inventor
黄胜
冉浩杉
张盛峰
李洋洋
付川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202011004389.9A priority Critical patent/CN112150493B/en
Publication of CN112150493A publication Critical patent/CN112150493A/en
Application granted granted Critical
Publication of CN112150493B publication Critical patent/CN112150493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06T3/02
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4084Transform-based scaling, e.g. FFT domain scaling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention researches and provides a natural scene screen area detection method based on semantic guidance aiming at the problems of screen position positioning in a natural scene, rough screen edge generated by an edge detection technology based on a full Convolutional Network (full probabilistic logical Network) and the like. An edge detection network based on semantic guidance is provided for screen edge detection, the network is divided into two parts, one part is composed of a reverse convolution module and used for completing an image segmentation task, and the other part is used for performing an image edge detection task after feature maps with different scales are fused. And simultaneously carrying out image segmentation and training of image edge detection tasks on the algorithm model, and finally fusing the output of the two tasks to obtain a final edge image. In the screen area positioning stage, edge image straight line detection is carried out through Hough Transform (Hough Transform), coincident straight lines are removed, screen corner points meeting conditions are taken out, area angles are corrected through Affine Transform (affinity Transform), and finally screen content images are obtained.

Description

Semantic guidance-based screen area detection method in natural scene
Technical Field
The invention relates to the field of deep learning and computer vision, in particular to an edge detection network and a screen positioning method based on semantic guidance.
Background
With the increasing of science and technology, the computing power of portable devices such as mobile phones is continuously enhanced, mobile devices with cameras are also increasingly popularized, and photographing and shooting can be conveniently carried out by using the devices. People often need to record important information played in a screen by using portable equipment such as a mobile phone, but when the screen is shot, backgrounds outside the screen are inevitably shot at the same time, and the backgrounds greatly interfere with subsequent screen content processing.
On the other hand, in a natural scene, when the screen content is shot by using a portable device such as a mobile phone, the screen content is inevitably interfered by many factors in the natural scene, and the interference affects the accuracy of the subsequent screen edge detection processing result, so that a screen positioning technology suitable for the natural condition is needed to help accurately position the screen position, so as to achieve the purpose of reducing the interference of external noise brought under the natural condition on the screen content analysis. In a natural scene, the research on the screen positioning technology is still less, and further exploration and research are urgently needed on the aspect.
In the field of computer vision, a traditional edge detection method is generally used for detecting a screen, the traditional method is used for carrying out edge detection processing on the whole image, and finally, a target screen edge is found out from a plurality of image edges in a mode of matching through artificial features. However, the conventional edge detection method has the unavoidable disadvantage that on one hand, the conventional edge detection method can introduce many interference edge pixels of natural scenes to detect all edges in the whole picture, thereby improving the difficulty of subsequently searching for target edges through artificial features. On the other hand, most of the conventional edge detection methods need to manually set a threshold value to adjust the sensitivity of detecting an edge, so that too high detection results in too many interference factors being detected, and manual feature matching cannot be performed, and too low detection results in the required screen edge being not detected.
In another chinese patent application publication No. CN102236784A, it is disclosed that the detection of the edges of the screen is performed by the method of hough transform scanning suspected edges in the image and multi-line fitting. Another US patent application publication US20080266253A discloses a system for tracking a spot in a computer projection area. The system captures an image through binarization and screens quadrangles from the binarized pixels to obtain a screen area. However, the algorithms for detecting the screen edge by using the traditional method cannot meet the requirements of different scenes, and the anti-interference capability is weak.
The edge detection algorithm based on deep learning is widely researched in the last years, with the development of artificial intelligence and the proposal of some edge detection algorithms based on a deep Convolutional Neural network (Convolutional Neural Networks) network, such as a classical edge detector (HED), a Convolutional Neural network (RCF) and the like, the detection method based on deep learning has achieved a good effect, and with the improvement of the architecture performance of the deep Convolutional Neural network, the detection performance of the detection method based on deep learning is better and better.
Meanwhile, in consideration of the problem that the edge of an image output by an edge detection network based on deep learning is rough and fuzzy, the invention designs an edge detection network based on semantic guidance, and combines abundant semantic information in an image segmentation task into edge detection by combining the image segmentation task and the image edge detection task, so as to obtain a more refined screen edge image.
Disclosure of Invention
The invention aims to design a method for obtaining a screen area in a natural scene by an edge detection network based on semantic guidance and a screen area positioning algorithm. And a screen area detection system is realized on the basis of the method, an edge detection network combined with semantic guidance is carried out on a GPU module of a server side, a screen edge corner screening algorithm used in a subsequent screen area positioning stage is carried out on a CPU module of a front end or a client side, and the front end calculation amount is reduced through front-end and back-end separation operation, so that the screen detection efficiency of the screen area detection system is improved.
The invention provides a method for detecting a screen area in a natural scene based on semantic guidance, which comprises the following steps: the image preprocessing module is used for preprocessing images shot in natural scenes, and comprises image denoising, contrast enhancement and the like; and (3) an edge detection network based on semantic guidance, namely fusing abundant semantic information of a predicted image in the image segmentation task, fusing a final output prediction image of the image edge detection end task and an output prediction image in the image segmentation task, and performing deep supervision by using an edge detection task label to obtain a refined edge detection image.
The invention mainly comprises two parts: a semantic-guided edge detection network and a screen edge corner screening algorithm. The method specifically comprises the following steps:
1. acquiring a scene screen image shot by a mobile phone image of a user, and preprocessing the natural scene image;
2. constructing an edge detection network based on semantic guidance;
3. pre-training the network by utilizing open source data and simulation data in related fields;
4. fine tuning a pre-established neural network by using a small amount of self-made screen data sets marked with natural scenes in a transfer learning mode;
5. and performing screen edge detection on the prepared screen edge data in the test set on the network after the transfer learning is completed, and obtaining a final screen edge image.
6. And performing post-processing operation by using the refined screen edge image obtained by the edge detection neural network, wherein the post-processing operation comprises removing repeated straight lines and non-edge straight lines, and screening out four most possible screen corner points by combining with screen edge characteristics.
7. After the screen edge feature screening algorithm obtains the screen corner point with the highest confidence coefficient, affine change is used for adjusting the image inclination angle, and the transformation process of affine transformation is represented as follows:
Figure GDA0002755180140000021
wherein
Figure GDA0002755180140000022
And
Figure GDA0002755180140000023
expressed as vectors and translation quantities of all pixel points of the image, A is the size of the rotation, magnification and scaling of the image expressed by the affine matrix,this expression is equivalent to the following equation in homogeneous coordinates:
Figure GDA0002755180140000031
method for carrying out affine transformation on pixel point vectors of areas in original image screen
Figure GDA0002755180140000032
Is mapped to the angle of the right screen, and the pixel point vector becomes
Figure GDA0002755180140000033
And finishing angle correction transformation.
The edge detection network based on semantic guidance in the steps is the main content of the invention, and provides a double-channel neural network structure based on a full convolution neural network, wherein the network can perform task learning of image segmentation and image edge detection through the double-channel neural network structure, and comprises a feature extraction module, an image segmentation module, an image edge detection module and a semantic guidance fusion module.
The feature extraction module is composed of a full Convolution network formed by removing a full connection layer of the VGG16, in order to increase the receptive field of the network under the condition of not losing a large amount of local information, a Hybrid scaled Convolution (Hybrid scaled Convolution) method is added into the last two layers of Convolution layers, a group of three Convolution kernels with different expansion rates (scaling rates) are arranged in the Convolution layers for Convolution in sequence, and the holes generated by the expansion Convolution can be reduced and the receptive field can be increased.
In the image segmentation module, a deconvolution channel is constructed with the left end of the network, up-sampling operation is performed through four deconvolution layers, the final high-level semantic feature graph of the backbone network is deconvolved to the size same as that of the original image, then deep level supervision is performed by using image segmentation labels, the network is enabled to perform image segmentation task training, and segmented images with the size of the original image are finally output.
The image edge detection Module performs image Feature Fusion through a multi-scale Feature Fusion Module (Feature Fusion Module) with attention mechanism, and the Module uses a SE Resnext Module obtained by combining SE Block and Resnext Block. After the feature graph output of each layer of Block Block in the backbone network enters a multi-scale feature fusion module, the feature graph output of different scales passes through an SE ResneXt module, resnetXt operation with a residual error group convolution structure is carried out to enrich input feature graph semantic information, then the feature graph semantic information is sent into the SE module, a learnable weight of each channel is given, so that the model actively learns the importance degree of each channel of the feature graph, and can promote useful features and inhibit the features which are not used for the current task according to the importance degree. And finally, performing deep supervision by using the image edge label, so that the network performs image segmentation task training, and finally outputting an edge image with the size of the original image.
The semantic guidance fusion module fuses the image features extracted by the edge detection module and the image segmentation module, and outputs more precise image edge features by the semantic feature guidance model extracted by the image segmentation module. And performing dimension splicing and dimension reduction on the output results of the tasks at the two ends to fuse abundant semantic information in image segmentation with the image edge detection task, thereby obtaining a fine image edge detection result.
Further, in order to train the network better, a method of weighting cross entropy loss function is adopted, so that labels can be fully supervised to characteristic diagrams of all layers. We express the loss function for each layer as:
Figure GDA0002755180140000034
in the formula:
Figure GDA0002755180140000041
wherein Pr (x) j (ii) a W) is the characteristic image pixel x in the mth layer j The activation value in the prediction graph is the activation function a j =sigmoid(x j )。|Y + I and Y - And | respectively refer to a pixel set which is the edge of the screen area and a pixel set which is not the edge of the screen in the group Truth, and W represents all parameters needing training in the network.
The formula of the weight of each layer when the jth pixel point value of each layer passes through the multi-scale feature fusion module on the left side is expressed as follows, wherein w is set 1 =w 2 =w 3 =w 4 =0.2, and w 5 =0.28。
Figure GDA0002755180140000042
Combining the losses of the above layers, the loss function of the fused layer is expressed as follows:
Figure GDA0002755180140000043
wherein | Y (fusion) I is expressed as sigmoid (A) (fusion) ),A (fusion) ={a j (side) |j=1,2,…,|Y||},A (fusion) As a set of output values for each layer.
Finally, fusing the image segmentation task and the image edge, adding the final loss functions, and calculating, wherein the final loss functions are expressed as follows:
L fusion =L (edge_fusion) +L (seg_fusion)
the two loss functions are added to serve as a final loss function, so that the network can better fuse rich semantic information in an image segmentation task, and the model can be converged faster in the training process.
Furthermore, the screen corner screening algorithm part mainly utilizes the proposed edge detection network to obtain a refined screen edge image for screen corner screening. Firstly, straight line detection is carried out through Hough transform, repeated straight lines are removed through a straight line duplicate removal method, all straight line intersection points are placed into a set, and the area is formed by the enclosed area of every four intersection pointsAnd sequencing the perimeters, and selecting the edge straight line with the largest area and the longest perimeter as the corner point of the screen image. The linear de-weight method comprises the following steps: setting a distance threshold T d And an angle threshold T θ If the distance between any two straight lines is less than the distance threshold T d And the angle difference of the two straight lines is less than the angle threshold value T θ One of the lines of smaller length is deleted. And finally, carrying out affine transformation on the acquired corner points of the edge of the screen to correct the angle of the screen area, so as to obtain a screen content image.
Due to the adoption of the technical scheme, the invention has the following advantages:
1. the invention designs an edge detection network guided by semantic information by guiding a model to predict image edges by utilizing the semantic information obtained in an image segmentation task. The network makes full use of abundant semantic features in the image segmentation task, deconvolves important image features extracted from a backbone network to the size of an original image by using a series of deconvolution, and performs deep supervision by using image segmentation task labels to finally obtain segmented images. And performing fusion operation on the edge image output obtained by multi-scale fusion at the right end and the segmentation image, and adding an edge image label for deep supervision, so that advanced semantic features are fully utilized, and a more refined edge image is obtained.
2. The invention provides a multi-scale Feature map Fusion Module (Feature Fusion Module) with attention mechanism, which is used for fusing different scale Feature maps output in a backbone network and fusing multi-scale Feature information into an edge image. According to the invention, by adding the SE ResneXt module into the multi-scale feature map fusion module, the feature map is firstly sent into ResnetXt with a residual error group convolution structure to enrich input feature map semantic information, and then sent into the SE module, and a learnable weight is given to each channel, so that the model can actively learn the importance degree of each channel of the feature map.
Drawings
In order to make the purpose, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for description:
FIG. 1 is a schematic flow chart of a method for detecting a screen area in a natural scene based on semantic guidance according to the present invention;
FIG. 2 is a schematic flow diagram of an image edge detection network module incorporating semantic information guidance according to the present invention;
FIG. 3 is a schematic diagram of the network architecture for image edge detection with semantic information guidance;
FIG. 4 is a multi-scale feature fusion module of the present invention with attention mechanism;
FIG. 5 is a schematic diagram of a post-processing flow of the screen region detection method of the present invention.
Detailed description of the preferred embodiments
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides a semantic guidance-based screen region detection algorithm in a natural scene, which specifically comprises the following steps as shown in figure 1:
step 1, inputting an image, and performing two simple preprocessing operations of denoising and contrast enhancement on the image;
and 2, constructing an edge detection neural network fusing image segmentation semantic information, and inputting the image into the network to detect the screen edge in a natural scene.
Step 3, selecting four screen corners in the screen edge image by using a screen edge corner screening algorithm, and recording the positions of the corners;
step 4, sending the four angular points into affine transformation for screen tilt correction to obtain screen content with a correct angle;
and 5, intercepting the screen area image after affine transformation to obtain a final screen content image.
The specific implementation manner provides a specific implementation step of a method for detecting a screen area in a natural scene based on semantic guidance. The screen area detection module in the natural scene comprises: the device comprises an image preprocessing module, an edge detection module, a screen area positioning module, an affine transformation module and a content acquisition module.
Step 1: the method comprises the steps of obtaining a scene screen image shot by a mobile phone, inputting the scene screen image into a preprocessing module, preprocessing the image by using operations such as denoising and contrast enhancement, and enhancing the edge characteristics of the input image.
Step 2: inputting the preprocessed image into an edge detection module, and constructing an edge detection network based on semantic guidance as shown in fig. 2 in the edge detection module, wherein the edge detection network is respectively a feature extraction module, an image segmentation module, an edge detection module and a semantic fusion module. The feature extraction module is a backbone network of the edge detection network, and a VGG16 network with a full connection layer removed is used; the image segmentation module carries out image segmentation tasks by using the semantic features extracted by the feature extraction module and is supervised by image segmentation labels; the edge detection module carries out an edge detection task by using the detail features of each layer extracted by the feature extraction module, and is supervised by an image edge label; and the semantic fusion module performs semantic guidance fusion by using the semantic features extracted by the image segmentation module and the edge features extracted by the edge detection module to obtain a final edge image.
And 3, step 3: the edge detection network is constructed by using a Tensorflow framework, as shown in FIG. 3, an image segmentation channel in the network deconvolves important image features extracted from a backbone network to the size of an original image by using a series of deconvolution, and deep supervision is performed by using an image segmentation task label, so that a segmented image is finally obtained. And a multi-scale characteristic image fusion module used by the image edge detection channel performs main network multi-scale output characteristic image fusion and uses an image edge task label to perform deep supervision. And finally, performing fusion operation on the edge image output and the segmentation image obtained by multi-scale fusion at the right end, and adding an edge image label for deep supervision, so that advanced semantic features are fully utilized, and a more refined edge image is obtained.
And 4, step 4: in an edge detection Module of the network, a multi-scale Feature Fusion Module (Feature Fusion Module) is constructed. The module is used for fusing different scale characteristic graphs output in a backbone network and fusing multi-scale characteristic information into an edge image. As shown in fig. 3, the multi-scale feature module receives feature images of different scales output by blocks of a backbone network, and performs residual learning and channel weight learning through an SE resendt module, so that output feature information is richer, channels with important feature information can be distinguished in all received channel numbers, and unimportant feature channels are suppressed.
And finally, carrying out 1 × 1 convolution dimensionality reduction operation and upsampling operation on the feature maps with different scales, and carrying out dimensionality splicing on the feature maps obtained by 5 channels to obtain an output feature map with the original image size and the channel number of 5. And learning corresponding weights of the 5 channels through an SE Block module, distinguishing the importance of each channel, finally obtaining the final output of an image edge detection task through 1 multiplied by 1 convolution dimensionality reduction operation, and supervising through an edge detection label.
Weight learning of each channel of the input feature map is performed by using SE Block, and the learned weight information is recorded as z c ∈R c By scaling u c The scale of (a) is produced by W × H. At this time we represent the c-th element weight calculation at z as follows:
Figure GDA0002755180140000071
output z c Can be regarded as a set of description information of the weight of the road channel map, which represents the set of the weight value occupied by the current channel.
And fusing the features of each layer in the backbone network through a multi-scale feature fusion module, distinguishing important feature information from unimportant feature information through SE Block, and finally outputting a predicted edge image of the edge detection module.
Step 5: defining a loss function of an edge detection network based on semantic guidance, fusing semantic features extracted by an image segmentation module and image edge features extracted by image edge detection, and defining a new loss function to train the network. In order to make the network training more sufficient, a method of weight cross entropy loss function is adopted, so that the labels can be fully supervised to each layer of feature diagram. We express the loss function for each layer as:
Figure GDA0002755180140000072
in the formula:
Figure GDA0002755180140000073
wherein Pr (x) j (ii) a W) is the characteristic image pixel x in the mth layer j The activation value in the prediction map is the activation function a j =sigmoid(x j )。|Y + I and Y - And | respectively refers to a pixel set at the edge of a screen area in the group Truth and a pixel set at the edge of a non-screen area, and W represents all parameters needing training in the network.
The formula of the weight of each layer when the jth pixel point value of each layer passes through the multi-scale feature fusion module on the left side is expressed as follows, wherein w is set 1 =w 2 =w 3 =w 4 =0.2, and w 5 =0.28。
Figure GDA0002755180140000081
Combining the losses of the above layers, the loss function of the fused layer is expressed as follows:
Figure GDA0002755180140000082
wherein | Y (fusion) | is expressed as sigmoid (A) (fusion) ),A (fusion) ={a j (side) |j=1,2,…,|Y||},A (fusion) As a set of output values for each layer.
Finally, fusing the image segmentation task and the image edge, adding the final loss functions, and calculating, wherein the final loss functions are expressed as follows:
L=L (edge_fusion) +L (seg_fusion)
the network carries out supervision training by using double labels, a VGG16 network is adopted as a main network, the weights of main networks of tasks at two ends are shared, and finally, the edge of a fine image screen is obtained by fusing the tasks at two ends.
And 6: and training the constructed edge detection network. By means of transfer learning, the open source data and the simulation data of the related fields are utilized to pre-train the network, and then the self-made marked screen data set is used for fine tuning the pre-trained network.
And 7: and storing the trained edge detection network, deploying the network to a GPU module of the server, and adjusting the network state to a port monitoring state. When the client sends the input image through the monitoring port, the edge detection network deployed on the server automatically performs inference prediction to obtain an edge image corresponding to the input image, and sends the edge image to the client through the corresponding port.
And step 8: and predicting the screen edge image in the natural scene. And calling an edge detection network of the server side, inputting the preprocessed input image, and returning the refined screen edge image.
And step 9: after-processing operation is performed on the screen edge image, a schematic diagram of a post-processing flow is shown in fig. 5, and firstly, a hough transform is called by an OpenCV library to perform straight line detection on the screen edge image, so that screen edge straight lines in all similar directions in the edge image are obtained.
Step 10: removing coincident straight lines from all straight lines, wherein the straight line duplicate removal method comprises the following steps: setting a distance threshold T d And an angle threshold T θ If the distance between any two straight lines is less than the distance threshold T d And the angle difference of the two straight lines is less than the angle threshold value T θ Then the straight line with the smaller length is deleted.
Step 11: and sequencing the remaining straight line intersections to serve as a set, taking four points each time to calculate the perimeter and the enclosed area, and considering the two points as the screen edge corner points under the natural scene when the two points are the maximum.
Step 12: and correcting the inclination angle of the screen by using the angular points of the screen and affine transformation to finally obtain the screen content image.

Claims (5)

1. A method for detecting a screen area in a natural scene based on semantic guidance is characterized in that a screen picture shot in the natural scene can be processed to obtain the screen content, and the method specifically comprises the following steps:
step 1, acquiring a scene screen image shot by a mobile phone of a user, and preprocessing an input image;
step 2, constructing an edge detection network based on semantic guidance, wherein the edge detection network comprises a feature extraction module, an image segmentation module, an image edge detection module and a semantic guidance fusion module; the image segmentation module constructs an extended path through deconvolution to extract image semantic information characteristics and segment images; the image edge detection module extracts and fuses edge features through a multi-scale Feature fusion module (Feature fusion module) with an attention mechanism; the semantic guidance fusion module fuses the semantic features extracted by the image segmentation module with the edge features of the image edge detection module to obtain a refined edge image under semantic guidance;
step 3, fine-tuning the network by using a self-made screen edge data set in a transfer learning mode;
step 4, performing screen edge detection on the input image on the trained neural network to obtain a screen edge image;
and 5, performing post-processing operation by using the obtained screen edge image, screening out four screen corner points in the image by combining with the screen edge characteristics, and performing inclination angle correction through affine transformation to obtain a final screen content image.
2. The method as claimed in claim 1, wherein the feature extraction module is composed of a full Convolution network formed by removing a full connection layer of the VGG16, and in order to increase the receptive field of the network without losing a large amount of local information, a Hybrid scaled Convolution (Hybrid scaled Convolution) method is added to the last two layers of Convolution layers, and a set of three Convolution kernels with different Dilation rates (scaling rates) are set in the Convolution layers for sequential Convolution, so that the holes generated by the Dilation Convolution can be reduced and the receptive field can be increased.
3. The method for detecting the screen area under the natural scene based on the semantic guidance as claimed in claim 1, wherein the image edge detection Module performs image Feature Fusion through a multi-scale Feature Fusion Module (Feature Fusion Module) with attention mechanism, and the Module uses an SE ResnexT Module obtained by combining SE Block and ResnexT Block; after entering a multi-scale feature fusion module, feature graph outputs of different scales of each layer of Block blocks in a backbone network pass through an SE Resnext module, resnetXt operation with a residual group convolution structure is carried out to enrich input feature graph semantic information, then Squeeze and Excitation (SE) operation is carried out, learnable weight is given to each channel, so that the model actively learns the importance degree of each channel of the feature graph, and can promote useful features and inhibit features which are not useful for a current task according to the importance degree.
4. The method for detecting the screen area under the natural scene based on the semantic guidance as claimed in claim 1, wherein the semantic guidance fusion module is used for fusing the image features extracted by the edge detection module and the image segmentation module, and outputting finer image edge features by using a semantic feature guidance model extracted by the image segmentation module; a new model loss function is defined in a semantic guidance fusion module to fuse two kinds of output characteristic information and is trained under the guidance of an edge label, and the newly defined loss function is expressed as follows:
L=L fusion (f(F seg ,F edge |X;W);W f )
wherein F seg Semantic features extracted for the image segmentation module, F edge Edge features extracted for the image edge detection module, f: (* I W) represents a feature map fusion operation, wherein W represents parameters of a convolution operation; l is fusion (F;W f ) Represents the cross entropy function employed, expressed as:
Figure FDA0003736276390000021
wherein, F i For the ith pixel in the feature map, pr (y) i |F i ) Is at pixel y i N is the total number of image pixels, W f A set of training parameters in the image segmentation task.
5. The method for detecting the screen area under the natural scene based on the semantic guidance according to claim 1, wherein the post-processing operation of the screen edge image mainly comprises: carrying out linear detection on the screen edge image based on Hough transform, removing coincident lines, sequencing linear intersection points to serve as a set, taking four points each time to calculate the perimeter and the enclosed area, and considering the maximum points as screen edge angular points in a natural scene; and finally, correcting the inclination angle of the screen by using the angular points of the screen and affine transformation to finally obtain the screen content image.
CN202011004389.9A 2020-09-22 2020-09-22 Semantic guidance-based screen area detection method in natural scene Active CN112150493B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011004389.9A CN112150493B (en) 2020-09-22 2020-09-22 Semantic guidance-based screen area detection method in natural scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011004389.9A CN112150493B (en) 2020-09-22 2020-09-22 Semantic guidance-based screen area detection method in natural scene

Publications (2)

Publication Number Publication Date
CN112150493A CN112150493A (en) 2020-12-29
CN112150493B true CN112150493B (en) 2022-10-04

Family

ID=73897546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011004389.9A Active CN112150493B (en) 2020-09-22 2020-09-22 Semantic guidance-based screen area detection method in natural scene

Country Status (1)

Country Link
CN (1) CN112150493B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700462A (en) * 2020-12-31 2021-04-23 北京迈格威科技有限公司 Image segmentation method and device, electronic equipment and storage medium
CN112784718B (en) * 2021-01-13 2023-04-25 上海电力大学 Insulator state identification method based on edge calculation and deep learning
CN112950615B (en) * 2021-03-23 2022-03-04 内蒙古大学 Thyroid nodule invasiveness prediction method based on deep learning segmentation network
CN112966691B (en) * 2021-04-14 2022-09-16 重庆邮电大学 Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN112926551A (en) * 2021-04-21 2021-06-08 北京京东乾石科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN113192060A (en) * 2021-05-25 2021-07-30 上海商汤临港智能科技有限公司 Image segmentation method and device, electronic equipment and storage medium
CN113469199A (en) * 2021-07-15 2021-10-01 中国人民解放军国防科技大学 Rapid and efficient image edge detection method based on deep learning
CN113344827B (en) * 2021-08-05 2021-11-23 浙江华睿科技股份有限公司 Image denoising method, image denoising network operation unit and device
CN114882091B (en) * 2022-04-29 2024-02-13 中国科学院上海微系统与信息技术研究所 Depth estimation method combining semantic edges
CN115512368A (en) * 2022-08-22 2022-12-23 华中农业大学 Cross-modal semantic image generation model and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105163078A (en) * 2015-09-01 2015-12-16 电子科技大学 Screen removal intelligent video monitoring system
CN108734719A (en) * 2017-04-14 2018-11-02 浙江工商大学 Background automatic division method before a kind of lepidopterous insects image based on full convolutional neural networks
CN108830855A (en) * 2018-04-02 2018-11-16 华南理工大学 A kind of full convolutional network semantic segmentation method based on the fusion of multiple dimensioned low-level feature

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11188794B2 (en) * 2017-08-10 2021-11-30 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105163078A (en) * 2015-09-01 2015-12-16 电子科技大学 Screen removal intelligent video monitoring system
CN108734719A (en) * 2017-04-14 2018-11-02 浙江工商大学 Background automatic division method before a kind of lepidopterous insects image based on full convolutional neural networks
CN108830855A (en) * 2018-04-02 2018-11-16 华南理工大学 A kind of full convolutional network semantic segmentation method based on the fusion of multiple dimensioned low-level feature

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Richer convolutional;LIU Y;《Proceedings of IEEE Conference on Computer Vision and Pattern Recognition》;20181031;第41卷(第8期);全文 *
基于RCF 的跨层融合特征的边缘检测;宋杰;《计算机应用》;20200710;全文 *

Also Published As

Publication number Publication date
CN112150493A (en) 2020-12-29

Similar Documents

Publication Publication Date Title
CN112150493B (en) Semantic guidance-based screen area detection method in natural scene
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
CN111931684B (en) Weak and small target detection method based on video satellite data identification features
CN110059586B (en) Iris positioning and segmenting system based on cavity residual error attention structure
CN111368846B (en) Road ponding identification method based on boundary semantic segmentation
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN111612008A (en) Image segmentation method based on convolution network
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN113592911B (en) Apparent enhanced depth target tracking method
CN113052170A (en) Small target license plate recognition method under unconstrained scene
CN112329784A (en) Correlation filtering tracking method based on space-time perception and multimodal response
CN113205103A (en) Lightweight tattoo detection method
CN113763417B (en) Target tracking method based on twin network and residual error structure
CN111582057B (en) Face verification method based on local receptive field
CN113627481A (en) Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens
CN110910497B (en) Method and system for realizing augmented reality map
CN108765384B (en) Significance detection method for joint manifold sequencing and improved convex hull
CN116758340A (en) Small target detection method based on super-resolution feature pyramid and attention mechanism
Li et al. A new algorithm of vehicle license plate location based on convolutional neural network
CN116091946A (en) Yolov 5-based unmanned aerial vehicle aerial image target detection method
CN113052311B (en) Feature extraction network with layer jump structure and method for generating features and descriptors
CN115035429A (en) Aerial photography target detection method based on composite backbone network and multiple measuring heads
Lin et al. Ml-capsnet meets vb-di-d: A novel distortion-tolerant baseline for perturbed object recognition
CN114202694A (en) Small sample remote sensing scene image classification method based on manifold mixed interpolation and contrast learning
Zhou et al. A lightweight object detection framework for underwater imagery with joint image restoration and color transformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant