CN113850284B - Multi-operation detection method based on multi-scale feature fusion and multi-branch prediction - Google Patents

Multi-operation detection method based on multi-scale feature fusion and multi-branch prediction Download PDF

Info

Publication number
CN113850284B
CN113850284B CN202110751853.9A CN202110751853A CN113850284B CN 113850284 B CN113850284 B CN 113850284B CN 202110751853 A CN202110751853 A CN 202110751853A CN 113850284 B CN113850284 B CN 113850284B
Authority
CN
China
Prior art keywords
network
layer
feature
branch prediction
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110751853.9A
Other languages
Chinese (zh)
Other versions
CN113850284A (en
Inventor
甘永东
朱新山
王佳宇
孙浩
张云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202110751853.9A priority Critical patent/CN113850284B/en
Publication of CN113850284A publication Critical patent/CN113850284A/en
Application granted granted Critical
Publication of CN113850284B publication Critical patent/CN113850284B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multi-operation detection method based on multi-scale feature fusion and multi-branch prediction, and belongs to the technical field of multimedia evidence obtaining. The prior art generally detects and locates only a single type of operation in an image. The method constructs a multi-operation detection network, extracts composite operation characteristics by using residual block convolution flow, performs multi-scale characteristic fusion, and realizes multi-operation detection by a multi-branch prediction module. By adopting the method provided by the invention, the model trained by the detection network can be constructed to detect and locate various types of operations, and the method has certain robustness to post-processing operations such as noise adding, scaling, blurring, secondary compression and the like.

Description

Multi-operation detection method based on multi-scale feature fusion and multi-branch prediction
Technical Field
The invention belongs to the technical field of multimedia evidence obtaining, and particularly relates to a multi-operation detection method based on multi-scale feature fusion and multi-branch prediction.
Background
With the rapid development of computer and internet technologies, digital multimedia information is widely applied to aspects of social production and life, such as broadcasting, television, news, games, tickets, physical evidence, books, documents, etc., so that our social life is greatly enriched. However, with various editing software, such as PhotoShop, corelDraw, a beauty show, etc., the multimedia file can be easily edited and modified, resulting in serious information security problems (integrity, confidentiality, usability), and even serious threat to social stability. As a potential technique, the basic idea of digital evidence collection is to extract a unique trace of an operation from a multimedia file to determine whether the file has been subjected to such an operation. Much research effort has been devoted to achieving detection of JPEG compression, histogram equalization, noise addition, blurring, median filtering, resampling, copy-move, etc.
Traditional methods detect tampering operations based on statistical features. JPEG compression is a common image processing approach for almost all digital imaging devices. Li and the like propose to detect whether an image is a JPEG compressed composite image by utilizing the characteristic that quality factor inconsistency or block position inconsistency often exists in a JPEG composite image [ Li, zhang Xinpeng ], detect a composite image by utilizing JPEG compression characteristics [ J ]. Apply science journal, 2008 (03): 281-287]. However, this method is only suitable for detecting a JPEG stitched composite image, and cannot cope with the case of full-image JPEG compression. Lin et al propose that the relationship between co-located discrete cosine transform (Discrete Cosine Transform, DCT) coefficients in different blocks of an image has invariance before and after JPEG compression [ Lin, CY and Chang, SF.A robust image authentication method distinguishing JPEG compression from malicious management.IEEE Transactions on Circuits and Systems for Video technology.11-2 (2001), 153-168]. The method can distinguish whether a certain image block in the image is tampered maliciously or not, and has good robustness to JPEG compression. Fan et al statistically model different types of operation images using a Gaussian mixture model (Gaussian Mixture Model, GMM), extract generic features to detect different types of image operations [ W.Fan, K.Wang, and F.Cayr.general-purpose image forensics using patch likelihood under image statistical models.In IEEE International Workshop on Information Forensics and Security (WIFS), pages 1-6, nov.2015]. The method needs to construct a plurality of two classifiers for classification detection, and is complex in operation and weak in robustness. Gallagher et al propose a method of detecting an image resampling operation [ A.C. Gallagher. Detection of linear and cubic interpolation in JPEG compressed images in The 2nd Canadian onference on Computer and Robot Vision (CRV' 05), pp.65-72, victoria, BC, canada,2005]. The method comprises the steps of firstly calculating second-order difference of an image, and judging whether the image is resampled or not according to peak change conditions of Fourier transform spectrograms of each row in a second-order difference matrix. Because the second-order differential matrix of the downsampled image has no periodicity, the method has poor detection effect on downsampling operation.
The conventional method has the following problems. Since the operation features are not significant, it is difficult to extract the effective operation features by hand. Secondly, feature extraction and classifier are designed separately, and simultaneous optimization of the feature extraction and the classifier cannot be achieved. Again, the multimedia file may undergo other post-processing operations after being maliciously tampered with, which is easy to erase or mask the trace of the operations, and makes the conventional evidence obtaining technology very difficult. Finally, for multi-operation evidence collection, the traditional method is complex to realize and has extremely limited performance.
In recent years, deep learning networks (Deep Learning Network, DLN) have achieved great success in many areas, such as image classification, generation and segmentation, object detection and localization, natural language processing, document analysis, and the like. DLN overtakes traditional problem handling modes and adopts an optimized working mode which is completely driven by data. It only needs to build a proper Neural Network (NN) for the problem to be treated, then trains on the sample set, optimizes the parameters of NN through training, and enables the NN to output correct predictions. NN organically combines feature extraction and classification in a framework to obtain optimized feature expression and classifier in a data-driven manner. Given the excellent performance of DLN, academia has begun to study DLN-based forensics.
Chen et al propose median filtering image evidence-taking schemes based on convolutional neural networks (Convolutional neural networks, CNNs) [ Jiansheng Chen, xiangui Kang, ye Liu, and Z Jane Wang.2015.Median filtering forensics based on convolutional neural networks.IEEE Signal Processing Letters, 11 (2015), 1849-1853]. For the median filtering operation, a preprocessing layer is designed to extract a median filtering residual image, and the median filtering residual image is input to a CNNs network. Bayar et al utilize a constrained convolution layer to suppress The content of The image, extract The operational features, and employ CNNs for multi-operation tamper detection [ Belhassen Bayar and Matthew C Stamm.A deep learning approach to universal image manipulation detection using a new convolutional layer in The 4th ACM Workshop on lnformation Hiding and Multimedia Security.ACM,5-10, 2016]. This method can detect only one operation occurring in a single image. Cozzolino et al propose that the residual-based local descriptor can be regarded as a simple constraint CNN for enabling forgery detection [ Davide Cozzolino, giovanni Poggi, and Luisa verdoliva.2017.Recasting residual based local descriptors as convolutional neural networks: an application to image forgery detection.In The 5th ACM Workshop on Information Hiding and Multimedia Security.ACM,159-164]. The residual unit adds the input directly to the output and then re-activates, not only alleviating the network degradation problem, but also being seen as a compact constraint. Rao et al propose that the first layer of CNN uses SRM kernel convolution to obtain local noise information of an image for tamper detection [ Yuan Rao and Jiangqun ni.2016.a deep learning approach to detection of splicing and copy-move forgeries in images in IEEE International Workshop on Information Forensics and Security (WIFS). IEEE,1-6]. These methods provide the idea of detecting whether the image has been tampered with, but do not allow accurate positioning of the operating area.
Digital forensic studies, in addition to requiring resolution of whether a detection operation is occurring, also require locating the specific location where the operation is occurring. Li designed a set of tamper detectors based on multiscale CNNs [ Haodong Li, weiqi Luo, xiaoqing Qia, and Jiwu Huang.2017.image forgery localization via Integrating tampering possibility maps.IEEE Transactions on Information Forensics and Security, 5 (2017), 1240-1252]. According to the scheme, a series of complementary tamper operation confidence heat maps are generated, the tampered areas in the digital image are positioned by utilizing the multi-scale features, and targets with different sizes in the image can be detected well by fusing the multi-scale features. Zhou et al propose a solution based on the master RCNN dual-flow network to enable location of tampered areas [ P.Zhou, X.Han, V.I.Morariu, and l.s. davis, learning rich features for image manipulation detection [ C ]. Internaltional Conference on Computer Vision and Pattern Recogintion,2018:1053-1061]. The method uses RGB streams for regression frames and noise streams in combination with RGB streams for classification. A common problem with the two previous approaches is that they are not robust. Wang Changmeng et al propose that adding an attention mechanism to a semantic segmentation network of a detected image increases attention to tampered edges, and simultaneously uses a maximum entropy Markov model to model the correlation between adjacent areas in the attention pattern [ Wang Changmeng ], an attention CNN-based document certificate type image tampering detection method: china, 112907598A [ P ].2021-06-04]. The patent is applied to falsification detection of qualification certificate documents.
The evidence obtaining technology based on the convolutional neural network can automatically extract effective features, train on a large-scale data set, has strong generalization of a network model and obtains a detection effect obviously superior to that of the traditional scheme. However, the existing evidence obtaining technology generally can only detect and locate a single operation type, and is not easy to expand due to the fact that a feature extractor or a preprocessing layer is designed for the fixed operation type. The positioning method generally divides the image into small image blocks for detection, the form of dividing the image blocks is fixed, the flexibility is not strong, and the positioning accuracy is not high. Moreover, the existing methods have the problem of weak robustness for post-processing operation.
Disclosure of Invention
In view of the drawbacks of the prior art, it is an object of the present invention to propose a forensic method capable of detecting and locating multiple types of operations simultaneously, and improving the robustness to post-processing operations.
In order to achieve the above purpose, the invention adopts the technical scheme that: a multi-operation detection method based on multi-scale feature fusion and multi-branch prediction comprises the following steps:
(1) Selecting a multimedia operation type, and constructing a multimedia data set processed by various operations;
(2) The residual block convolution flow is used as a main network for extracting composite operation characteristics, and a multi-operation detection depth convolution neural network is constructed by combining multi-scale characteristic fusion and multi-branch prediction links;
(3) Training the detection network by using the constructed data set to obtain an optimized detection network model.
Further, in step (1), each sample of the data set is subjected to more than one operation process, such as filtering, noise adding, sharpening, etc., and the area and shape of the operation region may be arbitrary.
In the step (2), a main network for extracting composite operation features is formed by connecting a group of residual blocks in series, so that the resolution of a feature map output by each residual block is continuously reduced, the channel number of the feature map is increased, multi-scale feature fusion is carried out on the feature maps with different resolutions obtained by the main network, and then the feature maps are output to a multi-branch prediction link for operation type classification and frame regression prediction.
Still further, a backbone feature extraction network is constructed using more than 5 residual blocks, each consisting of a convolution, pooling layer, BN layer.
Still further, the pooling layer in the residual block is provided with a stride process to reduce the feature map resolution, and the convolution output is non-linearly activated.
Further, the method for multi-scale feature fusion is to superimpose the highest-layer operation feature map generated by the backbone network with the lower-layer operation feature map after up-sampling operation to obtain fused features, repeat the process to obtain fused feature maps with other resolutions, and downsample the highest-layer output features to obtain feature maps with more than two resolutions, and combine the feature maps with the fused feature map generated before to form multi-scale features to be respectively input to multi-branch prediction links.
Furthermore, in the branch prediction link, a plurality of anchor frames with different sizes and aspect ratios are placed at each pixel position on the feature map with each resolution, and are respectively sent to a classification branch module and a frame regression branch module, the convolution operation is carried out, the features are further extracted, the prediction result is obtained, and the frame regression prediction is the offset of the operation area relative to the anchor frames.
Further, in the step (3), the detection network training uses data enhancement, a random probability inactivation technology and L2 regularization, so that the overfitting condition of the model is reduced.
The invention has the following effects: by adopting the method, the operation characteristics can be adaptively and automatically extracted without setting a pretreatment layer for certain operation; the method can detect and position various types of operations, and the positioning accuracy is much higher than that of image blocking detection; the method has better robustness to post-processing operations such as noise adding, scaling, blurring, secondary compression and the like; the end-to-end detection can be realized, the image is input, and the detection result is directly output; the detection speed is high; better performance can be achieved on a large-scale dataset.
Drawings
FIG. 1 is a basic flow of a multi-operation detection method based on multi-scale feature fusion and multi-branch prediction
FIG. 2 is a network block diagram of one embodiment of the invention
FIG. 3 is an unoperated original
FIG. 4 is a pseudo-color drawing of an operating region generated
FIG. 5 is a tamper operation chart after processing
FIG. 6 is a diagram showing the detection effect of the tamper operation in the present embodiment
Detailed Description
A specific embodiment of the present invention will be described below with reference to the accompanying drawings, and further illustrate the effects of the present invention.
The image signal is used as a multimedia file expression form, a multi-operation detection method based on multi-scale feature fusion and multi-branch prediction is realized, the whole process is shown in figure 1, and the method comprises the following steps:
step one, constructing a data set: 17125 three-channel pictures are taken from the PASCAL-VoC 2012 dataset, and part of the pictures are shown in fig. 3, and eight operation types are selected, which are homomorphic filtering, median filtering, adding white gaussian noise, local histogram equalization, gaussian blurring, edge sharpening, local resampling, gamma transformation. One or more irregular operating regions are randomly generated for the picture using a region random growth algorithm, as shown in fig. 4. Each region randomly selects one of eight operation types to operate, and an image after the manipulation operation processing is obtained, as shown in fig. 5. During training and testing, each picture is a sample. In order to supervise the training process and provide a reference for the evaluation index calculation of the detector, we also need to record the corresponding label information for each image sample. The label information of the sample includes the width, height, operation type of each region, left boundary, upper boundary, right boundary, and lower boundary of each region. When training, the image sample is sent into the network, the type and position information of each operation area in the image sample are obtained through prediction, and compared with the type and position information of the labels, the network learning process is to enable the prediction distribution of the network model to be continuously close to the real distribution of the labels, and the performance of the network model is improved. Evaluating detector performance is to objectively evaluate the difference between the predicted distribution of the network and the actual distribution of the tag. The data set consists of image samples and corresponding labels, and after the data set is generated, the data set is divided into a training set and a testing set according to the proportion of 9:1.
And secondly, adopting a residual block convolution stream as a main network for extracting composite operation characteristics, and combining multi-scale characteristic fusion and multi-branch prediction links to construct the multi-operation detection depth convolution neural network. Specifically, a main network for extracting the composite operation characteristics is formed by connecting a group of residual blocks in series, the network structure and the parameter configuration are shown in table 1, and the resolution of the characteristic map output by each residual block is continuously reduced. The residual block consists of two residual units, each residual unit is formed by Dropout after 3x3 convolution, relu nonlinear activation, BN layer batch normalization and 0.1 probability random inactivation, and the maximum pooling treatment is performed. The residual unit adds the input to the output to obtain the total output, prevents network degradation and extracts the composite operation characteristics, and continuously reduces the resolution of the characteristic map by convolution or pooling with a stride of 2, and increases the channel number of the characteristic map. And then carrying out multi-scale feature fusion on feature graphs with different resolutions obtained by a main network, wherein the process is carried out by carrying out up-sampling operation on the highest-layer operation feature graph generated by the main network, then overlapping the highest-layer operation feature graph with the lower-layer operation feature graph to obtain fused features, repeating the process to obtain two high-resolution fusion feature graphs, carrying out down-sampling on the highest-layer output features to obtain two resolution feature graphs, combining the five different-resolution fusion feature graphs into multi-scale features, and respectively inputting the multi-branch prediction links. The multi-branch prediction link carries out operation type classification and frame regression prediction, 4 anchor frames with different sizes and aspect ratios are placed at each pixel position on the characteristic diagram with each resolution, the sizes of the anchor frames are 0.1 a, 0.2 a 0.2,0.2 a 0.3 and 0.3 a 0.2 of the original diagram width and height respectively, and the anchor frames are respectively sent into 5 groups of branch classification and frame regression branch modules, wherein each branch carries out continuous 4 times 3 convolution operations to further extract characteristics to obtain a prediction result. The classification branch predicts the category of the anchor frame of each pixel point, the frame regression predicts the offset of the center point coordinate, width and height of the operation area relative to the anchor frame, and the whole network structure diagram is shown in fig. 2.
Table 1 backbone network parameter configuration of one embodiment of the invention
Figure BDA0003146544480000061
The symbol identifiers in table 1 are described as follows: conv in the table represents convolution, and the five parameters are the number of input channels, the number of output channels, the convolution kernel size, the edge filling size and the step number(s) respectively. BN represents batch normalization (Batch Normalization), re mountain represents nonlinear activation function, maxPool represents maximum pooling operation, dropout represents random deactivation operation, and parameters therein represent probability of random deactivation. In the table, each two continuous convolution blocks in layers 1-4 do residual operation once, and input is directly added to output to obtain total output.
Step three: training the detection network by using the constructed data set to obtain an optimized detection network model. The training optimizer adopts an SGD optimizer, and the initial value of the learning rate is set to be 5x10 -4 The batch size is set to be 32, the learning rate is reduced by 30% every 20000 iteration steps from 30000 th iteration steps, and the total training is 1x10 6 Number of iteration steps. The detection network training uses data enhancement, random probability inactivation technology and L2 regularization to reduce the overfitting condition of the model. The data enhancement adopts random mirror image inversion with less influence on an operation area, and is divided into X and Y directions. The random deactivation technology is applied to the main partThe convolution of the residual units of the dry network is followed. The L2 regularization coefficient is 0.005.
For the detection model obtained in the present embodiment, 1712 tamper operation pictures are used for testing. The CPU of the hardware platform tested was i5-9400, the dominant frequency was 2.9GHz, and the GPU was NVIDIARTX2060. The accuracy (Average Precision, AP) and overall average accuracy (mean Average Precision, mAP) indices for each type of operation are recorded as shown in table 2. The test effect of a portion of the test pictures is shown in fig. 6.
Table 2 test results of model on test set
Figure BDA0003146544480000071
As can be seen from the data in Table 2, the method of the present invention is capable of detecting multiple types of operations and locating the operation area simultaneously. The detection model of the embodiment is tested on a data set which is not subjected to post-processing operation, the average precision reaches 0.6969, the detection effect of Homomorphic filter and Gaussian white noise reaches more than 0.85, and the detection effect is good.
In order to verify the robustness of the method under various post-processing conditions, the following robustness experiments are carried out, JPEG compression with the quality factor of 75% and JPEG secondary compression (Jpeg 75 then JPEG 95) with the quality factor of 95% are respectively carried out on the test set image, four post-processing operations of scaling (Zoom), adding spiced salt noise (Salt and pepper noise) and bilateral filtering (Bilateral filters) are respectively carried out, and the AP and mAP of each type of operation are respectively recorded through the model detection of the embodiment, as shown in Table 3:
TABLE 3 robustness verification experiment results
Figure BDA0003146544480000081
As can be seen from the data in Table 3, the detection model of the embodiment is tested on the data set after post-processing operation, jpeg75 then Jpeg95 is subjected to secondary compression operation, mAP is 0.6574, and 0.0395 is reduced; the Zoom resampling operation, mAP 0.6435, was 0.0534 down; salt and pepper noise noise adding operation, mAP is 0.6810, and 0.0159 is reduced; bilateral filters filter operation, mAP 0.6251, drops 0.0718. The influence of the post-processing operations on the detection accuracy is not more than 8%, which means that the detector can still effectively detect and locate the tampering operation under the post-processing operations, and the robustness is strong.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive, for example:
1) The multimedia file type is not limited to images, audio, video, etc.;
2) The operation type is not limited to the eight types mentioned in the embodiments;
3) The network structure provided by the invention can also be applied to target detection of images, and is not limited to operation detection;
4) The selection of various data set construction parameters and network configuration parameters is not limited to the configuration in the embodiment.
The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (6)

1. A multi-operation detection method based on multi-scale feature fusion and multi-branch prediction comprises the following steps:
(1) Selecting a multimedia operation type, and constructing a multimedia data set processed by various operations; the operation processing comprises filtering, noise adding, sharpening, JPEG compression, histogram equalization, blurring, median filtering, resampling, copy-move, homomorphic filtering, adding Gaussian white noise, local histogram equalization, gaussian blurring, edge sharpening and gamma transformation; performing more than one operation on each sample of the multimedia data set;
(2) The residual block convolution flow is used as a main network for extracting composite operation characteristics, and a multi-operation detection depth convolution neural network is constructed by combining multi-scale characteristic fusion and multi-branch prediction links;
(3) Training the multi-operation detection depth convolutional neural network by using the constructed multimedia data set to obtain an optimized detection network model for classifying operation types and locating the specific position of the operation;
the method for multi-scale feature fusion comprises the steps of carrying out up-sampling operation on a highest-layer operation feature map generated by a backbone network, overlapping the highest-layer operation feature map with a lower-layer operation feature map to obtain fused features, repeating the process to obtain fused feature maps with other resolutions, carrying out down-sampling on the highest-layer output features to obtain feature maps with more than two resolutions, combining the feature maps with the fused feature maps generated before to form multi-scale features, and respectively inputting the multi-branch prediction links;
forming a main network for extracting composite operation characteristics by connecting a plurality of residual blocks in series, and numbering the directions of the residual blocks connected in series from a network input end to a network output end as 1,2,3 and …, n in sequence;
fusing the n-th layer residual block output characteristic diagram of the backbone network with the n-1 layer residual block output characteristic diagram to obtain a first fused characteristic diagram;
fusing the n-1 layer residual block output characteristic diagram of the backbone network with the n-2 layer residual block output characteristic diagram to obtain a second fused characteristic diagram;
carrying out downsampling treatment on the n-th layer residual error block output feature map of the backbone network to obtain at least two resolution feature maps, and marking the feature maps as the highest layer downsampling result;
and combining the final network output feature map of the backbone network, namely the n-th layer residual block output feature map, the first fusion feature map, the second fusion feature map and the highest layer downsampling result of the backbone network into a multi-scale feature.
2. The multi-operation detection method based on multi-scale feature fusion and multi-branch prediction according to claim 1, wherein: and forming a main network for extracting composite operation characteristics by connecting a group of residual blocks in series, continuously reducing the resolution of the characteristic diagram output by each residual block, increasing the channel number of the characteristic diagram, carrying out multi-scale characteristic fusion on the characteristic diagrams with different resolutions obtained by the main network, and then outputting the characteristic diagrams to a multi-branch prediction link for operation type classification and frame regression prediction.
3. A multi-operation detection method based on multi-scale feature fusion and multi-branch prediction as claimed in claim 2, wherein: and constructing a trunk feature extraction network by adopting more than 5 residual blocks, wherein each residual block consists of a convolution layer, a pooling layer and a BN layer.
4. A multi-operation detection method based on multi-scale feature fusion and multi-branch prediction as claimed in claim 3, wherein: the pooling layer in the residual block is provided with a stride process to reduce the feature map resolution, and the convolution output is non-linearly activated.
5. A multi-operation detection method based on multi-scale feature fusion and multi-branch prediction as claimed in claim 2, wherein: and in the branch prediction link, a plurality of anchor frames with different sizes and aspect ratios are placed at each pixel position on the feature map with each resolution, and are respectively sent into a classification branch module and a frame regression branch module, the convolution operation is carried out, the features are further extracted, the prediction result is obtained, and the frame regression prediction is the offset of the operation area relative to the anchor frames.
6.A multi-operation detection method based on multi-scale feature fusion and multi-branch prediction as claimed in claim 2, wherein: the detection network training uses data enhancement, random probability inactivation technology and L2 regularization to reduce the overfitting condition of the model.
CN202110751853.9A 2021-07-04 2021-07-04 Multi-operation detection method based on multi-scale feature fusion and multi-branch prediction Active CN113850284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110751853.9A CN113850284B (en) 2021-07-04 2021-07-04 Multi-operation detection method based on multi-scale feature fusion and multi-branch prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110751853.9A CN113850284B (en) 2021-07-04 2021-07-04 Multi-operation detection method based on multi-scale feature fusion and multi-branch prediction

Publications (2)

Publication Number Publication Date
CN113850284A CN113850284A (en) 2021-12-28
CN113850284B true CN113850284B (en) 2023-06-23

Family

ID=78975060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110751853.9A Active CN113850284B (en) 2021-07-04 2021-07-04 Multi-operation detection method based on multi-scale feature fusion and multi-branch prediction

Country Status (1)

Country Link
CN (1) CN113850284B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795370B (en) * 2023-02-10 2023-05-30 南昌大学 Electronic digital information evidence obtaining method and system based on resampling trace
CN118135641A (en) * 2024-05-07 2024-06-04 齐鲁工业大学(山东省科学院) Face counterfeiting detection method based on local counterfeiting area detection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063728A (en) * 2018-06-20 2018-12-21 燕山大学 A kind of fire image deep learning mode identification method
CN111191736A (en) * 2020-01-05 2020-05-22 西安电子科技大学 Hyperspectral image classification method based on depth feature cross fusion
WO2020199593A1 (en) * 2019-04-04 2020-10-08 平安科技(深圳)有限公司 Image segmentation model training method and apparatus, image segmentation method and apparatus, and device and medium
CN112464930A (en) * 2019-09-09 2021-03-09 华为技术有限公司 Target detection network construction method, target detection method, device and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11494937B2 (en) * 2018-11-16 2022-11-08 Uatc, Llc Multi-task multi-sensor fusion for three-dimensional object detection
CN110443143B (en) * 2019-07-09 2020-12-18 武汉科技大学 Multi-branch convolutional neural network fused remote sensing image scene classification method
CN110706242B (en) * 2019-08-26 2022-05-03 浙江工业大学 Object-level edge detection method based on depth residual error network
CN110490174A (en) * 2019-08-27 2019-11-22 电子科技大学 Multiple dimensioned pedestrian detection method based on Fusion Features
CN111368754B (en) * 2020-03-08 2023-11-28 北京工业大学 Airport runway foreign matter detection method based on global context information
CN111768372B (en) * 2020-06-12 2024-03-12 国网智能科技股份有限公司 Method and system for detecting foreign matters in cavity of GIS (gas insulated switchgear)
CN112712528B (en) * 2020-12-24 2024-03-26 浙江工业大学 Intestinal tract focus segmentation method combining multi-scale U-shaped residual error encoder and integral reverse attention mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063728A (en) * 2018-06-20 2018-12-21 燕山大学 A kind of fire image deep learning mode identification method
WO2020199593A1 (en) * 2019-04-04 2020-10-08 平安科技(深圳)有限公司 Image segmentation model training method and apparatus, image segmentation method and apparatus, and device and medium
CN112464930A (en) * 2019-09-09 2021-03-09 华为技术有限公司 Target detection network construction method, target detection method, device and storage medium
CN111191736A (en) * 2020-01-05 2020-05-22 西安电子科技大学 Hyperspectral image classification method based on depth feature cross fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multi-scale feature fusion residual network for Single Image Super-Resolution;Jinghui Qin等;《Neurocomputing》;第379卷;334-342 *
基于多尺度反卷积特征融合网络的光学遥感影像目标检测;陈婧;《中国优秀硕士学位论文全文数据库 (工程科技Ⅱ辑)》(第2期);C028-151 *
基于改进的三向流Faster R-CNN的篡改图像识别;徐代等;《计算机应用》(地5期);79-85 *

Also Published As

Publication number Publication date
CN113850284A (en) 2021-12-28

Similar Documents

Publication Publication Date Title
CN111311563B (en) Image tampering detection method based on multi-domain feature fusion
Zhong et al. An end-to-end dense-inceptionnet for image copy-move forgery detection
Chen et al. A serial image copy-move forgery localization scheme with source/target distinguishment
Li et al. Identification of deep network generated images using disparities in color components
Wang et al. Detection and localization of image forgeries using improved mask regional convolutional neural network
CN110349136A (en) A kind of tampered image detection method based on deep learning
CN113850284B (en) Multi-operation detection method based on multi-scale feature fusion and multi-branch prediction
Yang et al. Spatiotemporal trident networks: detection and localization of object removal tampering in video passive forensics
Gan et al. Video object forgery detection algorithm based on VGG-11 convolutional neural network
CN110457996B (en) Video moving object tampering evidence obtaining method based on VGG-11 convolutional neural network
AlSawadi et al. Copy-move image forgery detection using local binary pattern and neighborhood clustering
Yu et al. Manipulation classification for jpeg images using multi-domain features
CN111476727B (en) Video motion enhancement method for face-changing video detection
CN115393698A (en) Digital image tampering detection method based on improved DPN network
Huang et al. DS-UNet: a dual streams UNet for refined image forgery localization
CN111259792A (en) Face living body detection method based on DWT-LBP-DCT characteristics
Gu et al. FBI-Net: Frequency-based image forgery localization via multitask learning With self-attention
Dixit et al. Utilization of edge operators for localization of copy-move image forgery using WLD-HOG features with connected component labeling
Dixit et al. Copy-move image forgery detection a review
Jin et al. Object-based video forgery detection via dual-stream networks
Xia et al. Abnormal event detection method in surveillance video based on temporal CNN and sparse optical flow
Gan et al. Highly accurate end-to-end image steganalysis based on auxiliary information and attention mechanism
CN111814543B (en) Depth video object repairing and tampering detection method
CN115100128A (en) Depth forgery detection method based on artifact noise
CN108364256A (en) A kind of image mosaic detection method based on quaternion wavelet transformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant