CN112150493B - Semantic guidance-based screen area detection method in natural scene - Google Patents
Semantic guidance-based screen area detection method in natural scene Download PDFInfo
- Publication number
- CN112150493B CN112150493B CN202011004389.9A CN202011004389A CN112150493B CN 112150493 B CN112150493 B CN 112150493B CN 202011004389 A CN202011004389 A CN 202011004389A CN 112150493 B CN112150493 B CN 112150493B
- Authority
- CN
- China
- Prior art keywords
- image
- screen
- edge
- module
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 19
- 238000003708 edge detection Methods 0.000 claims abstract description 71
- 238000003709 image segmentation Methods 0.000 claims abstract description 37
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 11
- 230000004927 fusion Effects 0.000 claims description 46
- 238000000034 method Methods 0.000 claims description 31
- 230000006870 function Effects 0.000 claims description 19
- 230000009466 transformation Effects 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000012805 post-processing Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000013526 transfer learning Methods 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 3
- 230000001965 increasing effect Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000010339 dilation Effects 0.000 claims 2
- 230000005284 excitation Effects 0.000 claims 1
- 239000000284 extract Substances 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 abstract description 11
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000011160 research Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G06T3/02—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4084—Transform-based scaling, e.g. FFT domain scaling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Abstract
The invention researches and provides a natural scene screen area detection method based on semantic guidance aiming at the problems of screen position positioning in a natural scene, rough screen edge generated by an edge detection technology based on a full Convolutional Network (full probabilistic logical Network) and the like. An edge detection network based on semantic guidance is provided for screen edge detection, the network is divided into two parts, one part is composed of a reverse convolution module and used for completing an image segmentation task, and the other part is used for performing an image edge detection task after feature maps with different scales are fused. And simultaneously carrying out image segmentation and training of image edge detection tasks on the algorithm model, and finally fusing the output of the two tasks to obtain a final edge image. In the screen area positioning stage, edge image straight line detection is carried out through Hough Transform (Hough Transform), coincident straight lines are removed, screen corner points meeting conditions are taken out, area angles are corrected through Affine Transform (affinity Transform), and finally screen content images are obtained.
Description
Technical Field
The invention relates to the field of deep learning and computer vision, in particular to an edge detection network and a screen positioning method based on semantic guidance.
Background
With the increasing of science and technology, the computing power of portable devices such as mobile phones is continuously enhanced, mobile devices with cameras are also increasingly popularized, and photographing and shooting can be conveniently carried out by using the devices. People often need to record important information played in a screen by using portable equipment such as a mobile phone, but when the screen is shot, backgrounds outside the screen are inevitably shot at the same time, and the backgrounds greatly interfere with subsequent screen content processing.
On the other hand, in a natural scene, when the screen content is shot by using a portable device such as a mobile phone, the screen content is inevitably interfered by many factors in the natural scene, and the interference affects the accuracy of the subsequent screen edge detection processing result, so that a screen positioning technology suitable for the natural condition is needed to help accurately position the screen position, so as to achieve the purpose of reducing the interference of external noise brought under the natural condition on the screen content analysis. In a natural scene, the research on the screen positioning technology is still less, and further exploration and research are urgently needed on the aspect.
In the field of computer vision, a traditional edge detection method is generally used for detecting a screen, the traditional method is used for carrying out edge detection processing on the whole image, and finally, a target screen edge is found out from a plurality of image edges in a mode of matching through artificial features. However, the conventional edge detection method has the unavoidable disadvantage that on one hand, the conventional edge detection method can introduce many interference edge pixels of natural scenes to detect all edges in the whole picture, thereby improving the difficulty of subsequently searching for target edges through artificial features. On the other hand, most of the conventional edge detection methods need to manually set a threshold value to adjust the sensitivity of detecting an edge, so that too high detection results in too many interference factors being detected, and manual feature matching cannot be performed, and too low detection results in the required screen edge being not detected.
In another chinese patent application publication No. CN102236784A, it is disclosed that the detection of the edges of the screen is performed by the method of hough transform scanning suspected edges in the image and multi-line fitting. Another US patent application publication US20080266253A discloses a system for tracking a spot in a computer projection area. The system captures an image through binarization and screens quadrangles from the binarized pixels to obtain a screen area. However, the algorithms for detecting the screen edge by using the traditional method cannot meet the requirements of different scenes, and the anti-interference capability is weak.
The edge detection algorithm based on deep learning is widely researched in the last years, with the development of artificial intelligence and the proposal of some edge detection algorithms based on a deep Convolutional Neural network (Convolutional Neural Networks) network, such as a classical edge detector (HED), a Convolutional Neural network (RCF) and the like, the detection method based on deep learning has achieved a good effect, and with the improvement of the architecture performance of the deep Convolutional Neural network, the detection performance of the detection method based on deep learning is better and better.
Meanwhile, in consideration of the problem that the edge of an image output by an edge detection network based on deep learning is rough and fuzzy, the invention designs an edge detection network based on semantic guidance, and combines abundant semantic information in an image segmentation task into edge detection by combining the image segmentation task and the image edge detection task, so as to obtain a more refined screen edge image.
Disclosure of Invention
The invention aims to design a method for obtaining a screen area in a natural scene by an edge detection network based on semantic guidance and a screen area positioning algorithm. And a screen area detection system is realized on the basis of the method, an edge detection network combined with semantic guidance is carried out on a GPU module of a server side, a screen edge corner screening algorithm used in a subsequent screen area positioning stage is carried out on a CPU module of a front end or a client side, and the front end calculation amount is reduced through front-end and back-end separation operation, so that the screen detection efficiency of the screen area detection system is improved.
The invention provides a method for detecting a screen area in a natural scene based on semantic guidance, which comprises the following steps: the image preprocessing module is used for preprocessing images shot in natural scenes, and comprises image denoising, contrast enhancement and the like; and (3) an edge detection network based on semantic guidance, namely fusing abundant semantic information of a predicted image in the image segmentation task, fusing a final output prediction image of the image edge detection end task and an output prediction image in the image segmentation task, and performing deep supervision by using an edge detection task label to obtain a refined edge detection image.
The invention mainly comprises two parts: a semantic-guided edge detection network and a screen edge corner screening algorithm. The method specifically comprises the following steps:
1. acquiring a scene screen image shot by a mobile phone image of a user, and preprocessing the natural scene image;
2. constructing an edge detection network based on semantic guidance;
3. pre-training the network by utilizing open source data and simulation data in related fields;
4. fine tuning a pre-established neural network by using a small amount of self-made screen data sets marked with natural scenes in a transfer learning mode;
5. and performing screen edge detection on the prepared screen edge data in the test set on the network after the transfer learning is completed, and obtaining a final screen edge image.
6. And performing post-processing operation by using the refined screen edge image obtained by the edge detection neural network, wherein the post-processing operation comprises removing repeated straight lines and non-edge straight lines, and screening out four most possible screen corner points by combining with screen edge characteristics.
7. After the screen edge feature screening algorithm obtains the screen corner point with the highest confidence coefficient, affine change is used for adjusting the image inclination angle, and the transformation process of affine transformation is represented as follows:whereinAndexpressed as vectors and translation quantities of all pixel points of the image, A is the size of the rotation, magnification and scaling of the image expressed by the affine matrix,this expression is equivalent to the following equation in homogeneous coordinates:
method for carrying out affine transformation on pixel point vectors of areas in original image screenIs mapped to the angle of the right screen, and the pixel point vector becomesAnd finishing angle correction transformation.
The edge detection network based on semantic guidance in the steps is the main content of the invention, and provides a double-channel neural network structure based on a full convolution neural network, wherein the network can perform task learning of image segmentation and image edge detection through the double-channel neural network structure, and comprises a feature extraction module, an image segmentation module, an image edge detection module and a semantic guidance fusion module.
The feature extraction module is composed of a full Convolution network formed by removing a full connection layer of the VGG16, in order to increase the receptive field of the network under the condition of not losing a large amount of local information, a Hybrid scaled Convolution (Hybrid scaled Convolution) method is added into the last two layers of Convolution layers, a group of three Convolution kernels with different expansion rates (scaling rates) are arranged in the Convolution layers for Convolution in sequence, and the holes generated by the expansion Convolution can be reduced and the receptive field can be increased.
In the image segmentation module, a deconvolution channel is constructed with the left end of the network, up-sampling operation is performed through four deconvolution layers, the final high-level semantic feature graph of the backbone network is deconvolved to the size same as that of the original image, then deep level supervision is performed by using image segmentation labels, the network is enabled to perform image segmentation task training, and segmented images with the size of the original image are finally output.
The image edge detection Module performs image Feature Fusion through a multi-scale Feature Fusion Module (Feature Fusion Module) with attention mechanism, and the Module uses a SE Resnext Module obtained by combining SE Block and Resnext Block. After the feature graph output of each layer of Block Block in the backbone network enters a multi-scale feature fusion module, the feature graph output of different scales passes through an SE ResneXt module, resnetXt operation with a residual error group convolution structure is carried out to enrich input feature graph semantic information, then the feature graph semantic information is sent into the SE module, a learnable weight of each channel is given, so that the model actively learns the importance degree of each channel of the feature graph, and can promote useful features and inhibit the features which are not used for the current task according to the importance degree. And finally, performing deep supervision by using the image edge label, so that the network performs image segmentation task training, and finally outputting an edge image with the size of the original image.
The semantic guidance fusion module fuses the image features extracted by the edge detection module and the image segmentation module, and outputs more precise image edge features by the semantic feature guidance model extracted by the image segmentation module. And performing dimension splicing and dimension reduction on the output results of the tasks at the two ends to fuse abundant semantic information in image segmentation with the image edge detection task, thereby obtaining a fine image edge detection result.
Further, in order to train the network better, a method of weighting cross entropy loss function is adopted, so that labels can be fully supervised to characteristic diagrams of all layers. We express the loss function for each layer as:
in the formula:
wherein Pr (x) j (ii) a W) is the characteristic image pixel x in the mth layer j The activation value in the prediction graph is the activation function a j =sigmoid(x j )。|Y + I and Y - And | respectively refer to a pixel set which is the edge of the screen area and a pixel set which is not the edge of the screen in the group Truth, and W represents all parameters needing training in the network.
The formula of the weight of each layer when the jth pixel point value of each layer passes through the multi-scale feature fusion module on the left side is expressed as follows, wherein w is set 1 =w 2 =w 3 =w 4 =0.2, and w 5 =0.28。
Combining the losses of the above layers, the loss function of the fused layer is expressed as follows:
wherein | Y (fusion) I is expressed as sigmoid (A) (fusion) ),A (fusion) ={a j (side) |j=1,2,…,|Y||},A (fusion) As a set of output values for each layer.
Finally, fusing the image segmentation task and the image edge, adding the final loss functions, and calculating, wherein the final loss functions are expressed as follows:
L fusion =L (edge_fusion) +L (seg_fusion)
the two loss functions are added to serve as a final loss function, so that the network can better fuse rich semantic information in an image segmentation task, and the model can be converged faster in the training process.
Furthermore, the screen corner screening algorithm part mainly utilizes the proposed edge detection network to obtain a refined screen edge image for screen corner screening. Firstly, straight line detection is carried out through Hough transform, repeated straight lines are removed through a straight line duplicate removal method, all straight line intersection points are placed into a set, and the area is formed by the enclosed area of every four intersection pointsAnd sequencing the perimeters, and selecting the edge straight line with the largest area and the longest perimeter as the corner point of the screen image. The linear de-weight method comprises the following steps: setting a distance threshold T d And an angle threshold T θ If the distance between any two straight lines is less than the distance threshold T d And the angle difference of the two straight lines is less than the angle threshold value T θ One of the lines of smaller length is deleted. And finally, carrying out affine transformation on the acquired corner points of the edge of the screen to correct the angle of the screen area, so as to obtain a screen content image.
Due to the adoption of the technical scheme, the invention has the following advantages:
1. the invention designs an edge detection network guided by semantic information by guiding a model to predict image edges by utilizing the semantic information obtained in an image segmentation task. The network makes full use of abundant semantic features in the image segmentation task, deconvolves important image features extracted from a backbone network to the size of an original image by using a series of deconvolution, and performs deep supervision by using image segmentation task labels to finally obtain segmented images. And performing fusion operation on the edge image output obtained by multi-scale fusion at the right end and the segmentation image, and adding an edge image label for deep supervision, so that advanced semantic features are fully utilized, and a more refined edge image is obtained.
2. The invention provides a multi-scale Feature map Fusion Module (Feature Fusion Module) with attention mechanism, which is used for fusing different scale Feature maps output in a backbone network and fusing multi-scale Feature information into an edge image. According to the invention, by adding the SE ResneXt module into the multi-scale feature map fusion module, the feature map is firstly sent into ResnetXt with a residual error group convolution structure to enrich input feature map semantic information, and then sent into the SE module, and a learnable weight is given to each channel, so that the model can actively learn the importance degree of each channel of the feature map.
Drawings
In order to make the purpose, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for description:
FIG. 1 is a schematic flow chart of a method for detecting a screen area in a natural scene based on semantic guidance according to the present invention;
FIG. 2 is a schematic flow diagram of an image edge detection network module incorporating semantic information guidance according to the present invention;
FIG. 3 is a schematic diagram of the network architecture for image edge detection with semantic information guidance;
FIG. 4 is a multi-scale feature fusion module of the present invention with attention mechanism;
FIG. 5 is a schematic diagram of a post-processing flow of the screen region detection method of the present invention.
Detailed description of the preferred embodiments
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides a semantic guidance-based screen region detection algorithm in a natural scene, which specifically comprises the following steps as shown in figure 1:
and 2, constructing an edge detection neural network fusing image segmentation semantic information, and inputting the image into the network to detect the screen edge in a natural scene.
Step 3, selecting four screen corners in the screen edge image by using a screen edge corner screening algorithm, and recording the positions of the corners;
and 5, intercepting the screen area image after affine transformation to obtain a final screen content image.
The specific implementation manner provides a specific implementation step of a method for detecting a screen area in a natural scene based on semantic guidance. The screen area detection module in the natural scene comprises: the device comprises an image preprocessing module, an edge detection module, a screen area positioning module, an affine transformation module and a content acquisition module.
Step 1: the method comprises the steps of obtaining a scene screen image shot by a mobile phone, inputting the scene screen image into a preprocessing module, preprocessing the image by using operations such as denoising and contrast enhancement, and enhancing the edge characteristics of the input image.
Step 2: inputting the preprocessed image into an edge detection module, and constructing an edge detection network based on semantic guidance as shown in fig. 2 in the edge detection module, wherein the edge detection network is respectively a feature extraction module, an image segmentation module, an edge detection module and a semantic fusion module. The feature extraction module is a backbone network of the edge detection network, and a VGG16 network with a full connection layer removed is used; the image segmentation module carries out image segmentation tasks by using the semantic features extracted by the feature extraction module and is supervised by image segmentation labels; the edge detection module carries out an edge detection task by using the detail features of each layer extracted by the feature extraction module, and is supervised by an image edge label; and the semantic fusion module performs semantic guidance fusion by using the semantic features extracted by the image segmentation module and the edge features extracted by the edge detection module to obtain a final edge image.
And 3, step 3: the edge detection network is constructed by using a Tensorflow framework, as shown in FIG. 3, an image segmentation channel in the network deconvolves important image features extracted from a backbone network to the size of an original image by using a series of deconvolution, and deep supervision is performed by using an image segmentation task label, so that a segmented image is finally obtained. And a multi-scale characteristic image fusion module used by the image edge detection channel performs main network multi-scale output characteristic image fusion and uses an image edge task label to perform deep supervision. And finally, performing fusion operation on the edge image output and the segmentation image obtained by multi-scale fusion at the right end, and adding an edge image label for deep supervision, so that advanced semantic features are fully utilized, and a more refined edge image is obtained.
And 4, step 4: in an edge detection Module of the network, a multi-scale Feature Fusion Module (Feature Fusion Module) is constructed. The module is used for fusing different scale characteristic graphs output in a backbone network and fusing multi-scale characteristic information into an edge image. As shown in fig. 3, the multi-scale feature module receives feature images of different scales output by blocks of a backbone network, and performs residual learning and channel weight learning through an SE resendt module, so that output feature information is richer, channels with important feature information can be distinguished in all received channel numbers, and unimportant feature channels are suppressed.
And finally, carrying out 1 × 1 convolution dimensionality reduction operation and upsampling operation on the feature maps with different scales, and carrying out dimensionality splicing on the feature maps obtained by 5 channels to obtain an output feature map with the original image size and the channel number of 5. And learning corresponding weights of the 5 channels through an SE Block module, distinguishing the importance of each channel, finally obtaining the final output of an image edge detection task through 1 multiplied by 1 convolution dimensionality reduction operation, and supervising through an edge detection label.
Weight learning of each channel of the input feature map is performed by using SE Block, and the learned weight information is recorded as z c ∈R c By scaling u c The scale of (a) is produced by W × H. At this time we represent the c-th element weight calculation at z as follows:
output z c Can be regarded as a set of description information of the weight of the road channel map, which represents the set of the weight value occupied by the current channel.
And fusing the features of each layer in the backbone network through a multi-scale feature fusion module, distinguishing important feature information from unimportant feature information through SE Block, and finally outputting a predicted edge image of the edge detection module.
Step 5: defining a loss function of an edge detection network based on semantic guidance, fusing semantic features extracted by an image segmentation module and image edge features extracted by image edge detection, and defining a new loss function to train the network. In order to make the network training more sufficient, a method of weight cross entropy loss function is adopted, so that the labels can be fully supervised to each layer of feature diagram. We express the loss function for each layer as:
in the formula:
wherein Pr (x) j (ii) a W) is the characteristic image pixel x in the mth layer j The activation value in the prediction map is the activation function a j =sigmoid(x j )。|Y + I and Y - And | respectively refers to a pixel set at the edge of a screen area in the group Truth and a pixel set at the edge of a non-screen area, and W represents all parameters needing training in the network.
The formula of the weight of each layer when the jth pixel point value of each layer passes through the multi-scale feature fusion module on the left side is expressed as follows, wherein w is set 1 =w 2 =w 3 =w 4 =0.2, and w 5 =0.28。
Combining the losses of the above layers, the loss function of the fused layer is expressed as follows:
wherein | Y (fusion) | is expressed as sigmoid (A) (fusion) ),A (fusion) ={a j (side) |j=1,2,…,|Y||},A (fusion) As a set of output values for each layer.
Finally, fusing the image segmentation task and the image edge, adding the final loss functions, and calculating, wherein the final loss functions are expressed as follows:
L=L (edge_fusion) +L (seg_fusion)
the network carries out supervision training by using double labels, a VGG16 network is adopted as a main network, the weights of main networks of tasks at two ends are shared, and finally, the edge of a fine image screen is obtained by fusing the tasks at two ends.
And 6: and training the constructed edge detection network. By means of transfer learning, the open source data and the simulation data of the related fields are utilized to pre-train the network, and then the self-made marked screen data set is used for fine tuning the pre-trained network.
And 7: and storing the trained edge detection network, deploying the network to a GPU module of the server, and adjusting the network state to a port monitoring state. When the client sends the input image through the monitoring port, the edge detection network deployed on the server automatically performs inference prediction to obtain an edge image corresponding to the input image, and sends the edge image to the client through the corresponding port.
And step 8: and predicting the screen edge image in the natural scene. And calling an edge detection network of the server side, inputting the preprocessed input image, and returning the refined screen edge image.
And step 9: after-processing operation is performed on the screen edge image, a schematic diagram of a post-processing flow is shown in fig. 5, and firstly, a hough transform is called by an OpenCV library to perform straight line detection on the screen edge image, so that screen edge straight lines in all similar directions in the edge image are obtained.
Step 10: removing coincident straight lines from all straight lines, wherein the straight line duplicate removal method comprises the following steps: setting a distance threshold T d And an angle threshold T θ If the distance between any two straight lines is less than the distance threshold T d And the angle difference of the two straight lines is less than the angle threshold value T θ Then the straight line with the smaller length is deleted.
Step 11: and sequencing the remaining straight line intersections to serve as a set, taking four points each time to calculate the perimeter and the enclosed area, and considering the two points as the screen edge corner points under the natural scene when the two points are the maximum.
Step 12: and correcting the inclination angle of the screen by using the angular points of the screen and affine transformation to finally obtain the screen content image.
Claims (5)
1. A method for detecting a screen area in a natural scene based on semantic guidance is characterized in that a screen picture shot in the natural scene can be processed to obtain the screen content, and the method specifically comprises the following steps:
step 1, acquiring a scene screen image shot by a mobile phone of a user, and preprocessing an input image;
step 2, constructing an edge detection network based on semantic guidance, wherein the edge detection network comprises a feature extraction module, an image segmentation module, an image edge detection module and a semantic guidance fusion module; the image segmentation module constructs an extended path through deconvolution to extract image semantic information characteristics and segment images; the image edge detection module extracts and fuses edge features through a multi-scale Feature fusion module (Feature fusion module) with an attention mechanism; the semantic guidance fusion module fuses the semantic features extracted by the image segmentation module with the edge features of the image edge detection module to obtain a refined edge image under semantic guidance;
step 3, fine-tuning the network by using a self-made screen edge data set in a transfer learning mode;
step 4, performing screen edge detection on the input image on the trained neural network to obtain a screen edge image;
and 5, performing post-processing operation by using the obtained screen edge image, screening out four screen corner points in the image by combining with the screen edge characteristics, and performing inclination angle correction through affine transformation to obtain a final screen content image.
2. The method as claimed in claim 1, wherein the feature extraction module is composed of a full Convolution network formed by removing a full connection layer of the VGG16, and in order to increase the receptive field of the network without losing a large amount of local information, a Hybrid scaled Convolution (Hybrid scaled Convolution) method is added to the last two layers of Convolution layers, and a set of three Convolution kernels with different Dilation rates (scaling rates) are set in the Convolution layers for sequential Convolution, so that the holes generated by the Dilation Convolution can be reduced and the receptive field can be increased.
3. The method for detecting the screen area under the natural scene based on the semantic guidance as claimed in claim 1, wherein the image edge detection Module performs image Feature Fusion through a multi-scale Feature Fusion Module (Feature Fusion Module) with attention mechanism, and the Module uses an SE ResnexT Module obtained by combining SE Block and ResnexT Block; after entering a multi-scale feature fusion module, feature graph outputs of different scales of each layer of Block blocks in a backbone network pass through an SE Resnext module, resnetXt operation with a residual group convolution structure is carried out to enrich input feature graph semantic information, then Squeeze and Excitation (SE) operation is carried out, learnable weight is given to each channel, so that the model actively learns the importance degree of each channel of the feature graph, and can promote useful features and inhibit features which are not useful for a current task according to the importance degree.
4. The method for detecting the screen area under the natural scene based on the semantic guidance as claimed in claim 1, wherein the semantic guidance fusion module is used for fusing the image features extracted by the edge detection module and the image segmentation module, and outputting finer image edge features by using a semantic feature guidance model extracted by the image segmentation module; a new model loss function is defined in a semantic guidance fusion module to fuse two kinds of output characteristic information and is trained under the guidance of an edge label, and the newly defined loss function is expressed as follows:
L=L fusion (f(F seg ,F edge |X;W);W f )
wherein F seg Semantic features extracted for the image segmentation module, F edge Edge features extracted for the image edge detection module, f: (* I W) represents a feature map fusion operation, wherein W represents parameters of a convolution operation; l is fusion (F;W f ) Represents the cross entropy function employed, expressed as:
wherein, F i For the ith pixel in the feature map, pr (y) i |F i ) Is at pixel y i N is the total number of image pixels, W f A set of training parameters in the image segmentation task.
5. The method for detecting the screen area under the natural scene based on the semantic guidance according to claim 1, wherein the post-processing operation of the screen edge image mainly comprises: carrying out linear detection on the screen edge image based on Hough transform, removing coincident lines, sequencing linear intersection points to serve as a set, taking four points each time to calculate the perimeter and the enclosed area, and considering the maximum points as screen edge angular points in a natural scene; and finally, correcting the inclination angle of the screen by using the angular points of the screen and affine transformation to finally obtain the screen content image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011004389.9A CN112150493B (en) | 2020-09-22 | 2020-09-22 | Semantic guidance-based screen area detection method in natural scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011004389.9A CN112150493B (en) | 2020-09-22 | 2020-09-22 | Semantic guidance-based screen area detection method in natural scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112150493A CN112150493A (en) | 2020-12-29 |
CN112150493B true CN112150493B (en) | 2022-10-04 |
Family
ID=73897546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011004389.9A Active CN112150493B (en) | 2020-09-22 | 2020-09-22 | Semantic guidance-based screen area detection method in natural scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112150493B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112700462A (en) * | 2020-12-31 | 2021-04-23 | 北京迈格威科技有限公司 | Image segmentation method and device, electronic equipment and storage medium |
CN112784718B (en) * | 2021-01-13 | 2023-04-25 | 上海电力大学 | Insulator state identification method based on edge calculation and deep learning |
CN112950615B (en) * | 2021-03-23 | 2022-03-04 | 内蒙古大学 | Thyroid nodule invasiveness prediction method based on deep learning segmentation network |
CN112966691B (en) * | 2021-04-14 | 2022-09-16 | 重庆邮电大学 | Multi-scale text detection method and device based on semantic segmentation and electronic equipment |
CN112926551A (en) * | 2021-04-21 | 2021-06-08 | 北京京东乾石科技有限公司 | Target detection method, target detection device, electronic equipment and storage medium |
CN113192060A (en) * | 2021-05-25 | 2021-07-30 | 上海商汤临港智能科技有限公司 | Image segmentation method and device, electronic equipment and storage medium |
CN113469199A (en) * | 2021-07-15 | 2021-10-01 | 中国人民解放军国防科技大学 | Rapid and efficient image edge detection method based on deep learning |
CN113344827B (en) * | 2021-08-05 | 2021-11-23 | 浙江华睿科技股份有限公司 | Image denoising method, image denoising network operation unit and device |
CN114882091B (en) * | 2022-04-29 | 2024-02-13 | 中国科学院上海微系统与信息技术研究所 | Depth estimation method combining semantic edges |
CN115512368A (en) * | 2022-08-22 | 2022-12-23 | 华中农业大学 | Cross-modal semantic image generation model and method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105163078A (en) * | 2015-09-01 | 2015-12-16 | 电子科技大学 | Screen removal intelligent video monitoring system |
CN108734719A (en) * | 2017-04-14 | 2018-11-02 | 浙江工商大学 | Background automatic division method before a kind of lepidopterous insects image based on full convolutional neural networks |
CN108830855A (en) * | 2018-04-02 | 2018-11-16 | 华南理工大学 | A kind of full convolutional network semantic segmentation method based on the fusion of multiple dimensioned low-level feature |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11188794B2 (en) * | 2017-08-10 | 2021-11-30 | Intel Corporation | Convolutional neural network framework using reverse connections and objectness priors for object detection |
-
2020
- 2020-09-22 CN CN202011004389.9A patent/CN112150493B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105163078A (en) * | 2015-09-01 | 2015-12-16 | 电子科技大学 | Screen removal intelligent video monitoring system |
CN108734719A (en) * | 2017-04-14 | 2018-11-02 | 浙江工商大学 | Background automatic division method before a kind of lepidopterous insects image based on full convolutional neural networks |
CN108830855A (en) * | 2018-04-02 | 2018-11-16 | 华南理工大学 | A kind of full convolutional network semantic segmentation method based on the fusion of multiple dimensioned low-level feature |
Non-Patent Citations (2)
Title |
---|
Richer convolutional;LIU Y;《Proceedings of IEEE Conference on Computer Vision and Pattern Recognition》;20181031;第41卷(第8期);全文 * |
基于RCF 的跨层融合特征的边缘检测;宋杰;《计算机应用》;20200710;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112150493A (en) | 2020-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112150493B (en) | Semantic guidance-based screen area detection method in natural scene | |
CN113065558B (en) | Lightweight small target detection method combined with attention mechanism | |
CN111931684B (en) | Weak and small target detection method based on video satellite data identification features | |
CN110059586B (en) | Iris positioning and segmenting system based on cavity residual error attention structure | |
CN111368846B (en) | Road ponding identification method based on boundary semantic segmentation | |
CN111738055B (en) | Multi-category text detection system and bill form detection method based on same | |
CN111612008A (en) | Image segmentation method based on convolution network | |
CN113591968A (en) | Infrared weak and small target detection method based on asymmetric attention feature fusion | |
CN113592911B (en) | Apparent enhanced depth target tracking method | |
CN113052170A (en) | Small target license plate recognition method under unconstrained scene | |
CN112329784A (en) | Correlation filtering tracking method based on space-time perception and multimodal response | |
CN113205103A (en) | Lightweight tattoo detection method | |
CN113763417B (en) | Target tracking method based on twin network and residual error structure | |
CN111582057B (en) | Face verification method based on local receptive field | |
CN113627481A (en) | Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens | |
CN110910497B (en) | Method and system for realizing augmented reality map | |
CN108765384B (en) | Significance detection method for joint manifold sequencing and improved convex hull | |
CN116758340A (en) | Small target detection method based on super-resolution feature pyramid and attention mechanism | |
Li et al. | A new algorithm of vehicle license plate location based on convolutional neural network | |
CN116091946A (en) | Yolov 5-based unmanned aerial vehicle aerial image target detection method | |
CN113052311B (en) | Feature extraction network with layer jump structure and method for generating features and descriptors | |
CN115035429A (en) | Aerial photography target detection method based on composite backbone network and multiple measuring heads | |
Lin et al. | Ml-capsnet meets vb-di-d: A novel distortion-tolerant baseline for perturbed object recognition | |
CN114202694A (en) | Small sample remote sensing scene image classification method based on manifold mixed interpolation and contrast learning | |
Zhou et al. | A lightweight object detection framework for underwater imagery with joint image restoration and color transformation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |