CN110210498B

CN110210498B - Digital image equipment evidence obtaining system based on residual learning convolution fusion network

Info

Publication number: CN110210498B
Application number: CN201910472188.2A
Authority: CN
Inventors: 倪蓉蓉; 杨朋朋; 李欣; 赵耀
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2021-08-10
Anticipated expiration: 2039-05-31
Also published as: CN110210498A

Abstract

The invention provides a digital image equipment evidence obtaining system based on a residual learning convolution fusion network. The method mainly comprises the following steps: the digital image processing device comprises a digital image input unit, a digital image preprocessing unit, an image feature extraction and classification unit and a digital image equipment identification result output unit; the digital image preprocessing unit carries out residual learning filtering processing on the digital image transmitted by the digital image input unit, the image feature extraction and classification unit carries out bottom-layer feature extraction and high-layer semantic feature extraction on the preprocessed digital image by using a weight convolution and traditional convolution fusion network, the digital image equipment identification result output unit obtains equipment relevant features according to the high-layer semantic features of the digital image, and equipment evidence obtaining results of the digital image are output according to the equipment relevant features. The embodiment of the invention can effectively carry out equipment forensics on the digital image under the condition of the digital image with small resolution, and effectively improves the equipment identification performance of the digital image.

Description

Digital image equipment evidence obtaining system based on residual learning convolution fusion network

Technical Field

The invention relates to the technical field of digital image processing, in particular to a digital image equipment evidence obtaining system based on a residual learning convolution fusion network.

Background

With the rapid development of digital imaging devices, camcorders, digital cameras, and mobile phones have been widely used in daily life. Digital image acquisition, distribution and sharing have become a popular means of information transmission and exchange in modern social networks. At the same time, some powerful and easy-to-use digital image processing software provides a simple tool to edit digital images. The security problem of digital images has attracted a great deal of attention in the last decade, especially in the areas of judicial and criminal investigation. Thus, digital image forensic technology, as a multimedia security technology, can be used to verify the originality, authenticity and authenticity of digital images. It has important meaning to justice and social order. Digital image forensics research is aimed at answering several questions, firstly, by which device in the pool the digital image was captured, i.e. forensics of the digital image source, secondly, whether the digital image is an original digital image or a tampered digital image, secondly, what degree of tampering has been done to what parts of the digital image, and finally, processing history identification of the digital image. As a primary task of digital image forensics, digital image source forensics is an important research topic. The digital image source evidence obtaining technology can establish the mapping relation between the multimedia digital image information and the physical equipment. Device forensics the main technology of forensics as a source of digital images can provide important evidence for intellectual property protection or criminal cases. For example, a digital image with copyright protection is redistributed without authorization from the author. The digital image can be distinguished by the equipment forensics technology whether the digital image is shot by an original camera or other cameras, so that the copyright ownership of the digital image can be judged. In criminal investigation cases, criminal investigators can establish the relationship between illegal digital images and actual physical individuals by means of equipment forensics technology, and therefore identity of suspects is confirmed in an auxiliary mode. Therefore, equipment forensics techniques are urgently needed.

Device forensics is an important branch of digital image source forensics, and the research problem is how to effectively distinguish the model or brand of device used for digital image acquisition. The first task of evidence collection of research equipment is to recognize the imaging principle of a camera, so that effective features are extracted to realize mapping between multimedia digital image information and physical equipment. The basic principle of camera imaging is that light from a natural scene is focused onto a digital image sensor through a lens system, a combining filter. The digital image sensor is a pixel array composed of a plurality of tiny pixels arranged according to a certain rule, and the digital image sensor comprises a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor). The photo-sensing pixel array integrates light across the entire spectrum and converts it into electrical signals. Since most digital image sensors only use one CFA (color filter array), that is, the sensor outputs a monochrome digital image, a color digital image needs to be obtained by color interpolation in the digital image processing part. And then, after white balance, gamma correction and other operations are carried out on the interpolated digital image, the interpolated digital image is converted into a specific digital image format and stored to obtain a digital photo. The whole process can be regarded as a 'pipeline', and each step of the digital image operation in the pipeline generates some 'fingerprints' which are irrelevant to the digital image content, and the fingerprints can be used for establishing the connection between the digital image and the shooting device thereof so as to serve as key breakthrough points for device forensics.

For the research of device evidence obtaining, domestic and foreign scholars have proposed a series of schemes, such as device evidence obtaining based on algorithms like lens distortion model, sensor dust pattern, demosaicing, JPEG compression, sensor pattern noise, etc. Where sensor pattern noise has proven to be a very effective technique in device forensics. The modal noise is generated by some unavoidable manufacturing imperfections in the sensor manufacturing process. It is camera-independent and can be used to identify different devices of the same make and model. First, Lukas et al apply the pattern noise in the digital image sensor as the intrinsic fingerprint of the device to the camera source for the first time, extract the pattern noise using the wavelet de-noising technique, and use the pattern noise as the connection between the digital image and the device. Chen et al established a simplified model of the digital image captured by the camera and proposed a method for maximum likelihood estimation of PRNU noise based on this model. Because the quality of the extracted PRNU is greatly influenced by the content of the digital image, the pattern noise extracted from the smooth and bright digital image has good effect, and the pattern noise extracted from the digital image with more textures and dark color has poor quality. In order to reduce the influence of digital image content, the C.Li ranks digital images according to different digital image content, and gives different weight coefficients to mode noise of different ranks, so that the information of the digital image content of texture areas is suppressed. And X.Kang can better interpolate the digital image by using a content adaptive interpolation algorithm, especially can keep edge information, and the residual digital image obtained by differentiating in such a way can better reduce the influence generated by the edge. The article is newly improved on the basis of a local discrete cosine transform filter, so that the filtered digital image can better keep the original quality of the digital image, and the residual digital image is less influenced by the content of the digital image. In the post-processing stage of the digital image, operations such as JPEG compression, CFA and the like are performed periodically in a blocking mode, so that the extracted noise residue has periodicity, the same periodicity exists among fingerprints of the same post-processing camera, and the misjudgment rate is improved. To suppress these periodicities, subsequent articles have employed operating on the fourier spectrum of the pattern noise, suppressing peaks in the spectrum to reduce the effect of periodicity.

The effectiveness, robustness and universality of the equipment forensics scheme in the prior art have certain defects. Although the sensor pattern noise has stable specificity, the device forensics algorithm has the primary task of accurately extracting the sensor pattern noise because the sensor pattern noise is a weak signal compared with the digital image content signal. Two key problems to be solved are to accurately extract the sensor pattern noise:

1) the accuracy of the extraction of the sensor mode noise has strong correlation with the content of the digital image, the sensor mode noise extracted from the smooth digital image area has better effect, but the extraction effect of the non-smooth digital image area is poorer;

2) the accuracy of the sensor mode noise extraction has strong dependence on the size of the digital image, and the large-size digital image contains more statistical information so that the large-size digital image has better detection performance, but the performance of the algorithm existing in the small-size digital image cannot reach the performance of engineering application.

Disclosure of Invention

The embodiment of the invention provides a digital image equipment evidence obtaining system based on a residual error learning convolution fusion network, which aims to overcome the problems in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme.

A digital image equipment forensics system based on a residual learning convolution fusion network comprises: the digital image processing device comprises a digital image input unit, a digital image preprocessing unit, an image feature extraction and classification unit and a digital image equipment identification result output unit;

the digital image input unit is used for receiving an externally input digital image and transmitting the digital image to the digital image preprocessing unit;

the digital image preprocessing unit is used for preprocessing the digital image transmitted by the digital image input unit, the preprocessing comprises residual error learning filtering processing, and the preprocessed digital image is transmitted to the image feature extraction and classification unit;

the image feature extraction and classification unit is used for extracting bottom-layer features and high-layer semantic features of the preprocessed digital images transmitted by the digital image preprocessing unit by using a weight convolution and traditional convolution fusion network, and transmitting the obtained high-layer semantic features of the digital images to the digital image equipment discrimination result output unit;

and the digital image equipment identification result output unit is used for obtaining equipment related characteristics according to the high-level semantic characteristics of the digital images transmitted by the image characteristic extraction and classification unit and outputting equipment evidence obtaining results of the digital images according to the equipment related characteristics.

Preferably, the digital image is a 64x64 resolution digital image.

Preferably, the process of residual learning filtering processing in the digital image preprocessing unit includes: performing convolution operation by 64 convolution kernels with the size of 1x1 by one step, and then performing block normalization and ReLU activation function operation; then carrying out convolution operation next to 64 convolution kernels with the size of 3x3 and one step, and then carrying out block normalization and ReLU activation function operation; then, carrying out convolution operation on 256 convolution kernels with 1x1 size and one-step length, and then carrying out block normalization and ReLU activation function operation to obtain an output result of F (I, W)_k) (ii) a Carrying out convolution operation on the image by 256 convolution kernels with 1x1 size and one-step length, and then carrying out block normalization and ReLU activation function operation to obtain an output result W_sI; and adding the two output results and performing ReLU activation operation to obtain R ═ F (I, W)_k)+W_sI, residual learning filtering part for realizing imageAnd (6) processing.

Preferably, the image feature extraction and classification unit is configured to extract bottom layer features and inter-pixel correlation features of the preprocessed digital image by using a weight convolution and a weight convolution structure in a conventional convolution fusion network, input the bottom layer features and the inter-pixel correlation features into the weight convolution and the conventional convolution structure in the conventional convolution fusion network, and extract high-level semantic features of the digital image through the conventional convolution structure, where the high-level semantic features are device correlation features.

Preferably, the weight convolution structure in the image feature extraction and classification unit comprises a squeezing unit and an excitation unit, and the squeezing unit and the excitation unit can construct any given affine transformation;

the extrusion unit is used for obtaining a channel descriptor by considering extrusion global space information, obtaining channel-by-channel statistics by using a global average pooling mode, and formulating to be as follows, wherein z is output obtained by shrinking a space dimension through U, and for the c channel, z can be obtained by calculating according to the following formula

The excitation unit is configured to capture a channel-by-channel dependency relationship by using a mapping function, where the activation function is defined as follows:

s＝F_ex(z,W)＝δ(g(z,W))＝δ(W₂δ(W₁z)) (8)

in the above formula, δ represents the ReLU activation function;

the output of the image feature extraction and classification unit is obtained by activating a function scaling transformation, as follows

F is the product between the scale factor s and the characteristic graph u channel by channel, and the excitation unit maps the input z to a set of channel special weights;

and inputting the bottom layer characteristic and the inter-pixel correlation characteristic output by the excitation unit into a conventional convolution structure.

Preferably, the digital image device identification result output unit is configured to obtain a device forensics result that is a 1 xN-dimensional vector, where N is the number of the digital image obtaining devices, select a device ID corresponding to a value with the highest score from the 1 xN-dimensional vector, and use the device ID as an output result, that is, identify that the digital image to be detected is obtained by the device ID.

According to the technical scheme provided by the embodiment of the invention, the weight convolution structure is adopted to extract the bottom layer characteristics of the preprocessed digital image, the traditional convolution structure is adopted to extract the high-layer semantic characteristics of the digital image, the digital image can be effectively subjected to equipment evidence obtaining under the condition of a small-resolution digital image, and the equipment identification performance of the digital image is effectively improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating an implementation principle of a digital image device forensics system based on a residual learning convolution fusion network according to an embodiment of the present invention;

fig. 2 is a structural diagram of a digital image device forensics system based on a residual learning convolution fusion network according to an embodiment of the present invention;

fig. 3 is a structural diagram of a residual learning convolution fusion network according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a digital image device forensics method based on a residual learning convolution fusion network according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a device list and image information according to an embodiment of the present invention;

fig. 6 is a schematic diagram illustrating performance comparison of an apparatus identification system according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

Example one

In order to solve the problems of the traditional scheme, in recent two years, device forensics researchers began to pay attention to the relevant algorithm based on deep learning, and preliminary trial work was proposed. Deep learning is a leading-edge field of machine learning, and aims to build a neural network by simulating the human brain. Thus, it can mimic the mechanisms of the human brain to interpret data. Different from the traditional digital image equipment evidence obtaining algorithm, the deep learning algorithm integrates the feature extraction and the feature classification into a network structure, and the end-to-end automatic feature learning classification is realized. The method typically uses preprocessing and weight convolution to merge with a conventional convolution merging network, and a system block diagram thereof is shown in fig. one. The application of deep learning to preliminary studies of device forensics demonstrates its effectiveness over traditional approaches. However, the performance of the existing device forensics scheme is to be improved, and the optimization path of the existing device forensics scheme comprises two aspects, namely the exploration of a proper preprocessing scheme and the design of a deep learning network structure for effective feature extraction and classification. Based on the current state of research in device forensics, we focus on camera source discrimination system design in the case of small-size digital images.

The embodiment of the invention provides a digital image equipment evidence obtaining system based on a residual learning weight convolution and a traditional convolution fusion network, which improves the effectiveness and the practicability of the system from two aspects. Firstly, the digital image is preprocessed by residual learning, so that the characteristic signal-to-noise ratio of the equipment is effectively improved; secondly, the purposes of optimizing feature extraction and considering system operation amount are achieved by utilizing the weight convolution at the bottom layer of the weight convolution and traditional convolution fusion network and utilizing the traditional convolution layer at the high layer. Under the condition of small-resolution digital images, the method can effectively improve the equipment identification performance of the digital images.

The schematic diagram of the implementation principle of the digital image equipment forensics system based on the residual learning weight convolution and the traditional convolution fusion network provided by the embodiment of the invention is shown in fig. 1, and the specific structural schematic diagram is shown in fig. 2, and the digital image equipment forensics system comprises the following units: the device comprises a digital image input unit, a digital image preprocessing unit, an image feature extraction and classification unit and a digital image device identification result output unit.

the digital image preprocessing unit is used for preprocessing the digital image transmitted by the digital image input unit, the preprocessing comprises residual error learning filtering processing, and the preprocessed digital image is transmitted to the image feature extraction and classification unit; the high-pass filtering and the convolution filtering are existing filtering technologies, three kinds of filtering are tested and respectively used as preprocessing and convolution neural network structure fusion, and the proposed residual learning filtering is found to be the optimal preprocessing mode, so that the residual learning filtering is finally used in a preprocessing unit.

The image feature extraction and classification unit is used for extracting bottom-layer features and high-layer semantic features of the preprocessed digital images transmitted by the digital image preprocessing unit by using a weight convolution and traditional convolution fusion network, and transmitting the obtained high-layer semantic features of the digital images to the digital image equipment discrimination result output unit; the network structure from the shallow layer to the deep layer is a process of feature extraction layer by layer. According to the existing research, the network shallow layer can learn the bottom layer characteristics of the image, such as the edge and corner characteristics, the color characteristics, the texture characteristics and the like of the image; the deep layers of the network learn the high-level semantic features of the images. The embodiment of the invention extracts the bottom layer characteristics and the inter-pixel correlation characteristics of the preprocessed digital image by using the weight convolution and the weight convolution structure in the traditional convolution fusion network. Then, inputting the bottom layer characteristics and the inter-pixel correlation characteristics of the digital image into a traditional convolution structure in a weight convolution and traditional convolution fusion network, and extracting the high-level semantic characteristics of the digital image through the traditional convolution structure, wherein the high-level semantic characteristics are the equipment correlation characteristics.

And the digital image equipment identification result output unit is used for obtaining equipment related characteristics according to the high-level semantic characteristics of the digital images transmitted by the image characteristic extraction and classification unit and outputting equipment evidence obtaining results of the digital images according to the equipment related characteristics. The evidence obtaining result of the equipment is a 1 xN-dimensional vector, N is the number of the digital image obtaining equipment, the equipment ID corresponding to the value with the highest score is selected from the 1 xN-dimensional vector, and the equipment ID is used as an output result, namely, the digital image to be detected is identified to be obtained by the equipment ID.

The function of each unit is described in detail below.

A digital image input unit: a resolution of 64x64 is required for the input image. In the training process of a system model, performing sliding window clipping of 64x64 size on image data of a training image data set to obtain a large-scale training image data set with 64x64 resolution; storing the trained model parameters; during the test, 64x64 resolution images were used as input to the system.

A digital image preprocessing unit: the image preprocessing process is particularly important for a digital image device evidence obtaining system based on deep learning, and the interference of image content can be removed through effective image preprocessing operation, so that the signal-to-noise ratio of the device evidence obtaining characteristics is improved, and the purpose of improving the system performance is achieved. The embodiment of the invention carries out residual learning filtering preprocessing operation on the digital image, and the digital image preprocessing operation and the convolutional neural network structure are integrated together to carry out end-to-end learning. And selecting the best filtering scheme through a training learning process.

The embodiment of the invention respectively embeds three preprocessing modes of high-pass filtering, convolution filtering and residual learning filtering into a convolution neural network structure to carry out end-to-end learning. And finding an optimal preprocessing scheme, namely a residual learning filtering scheme, through training.

As shown in fig. 3, two preprocessing manners, 5x5 high-pass filtering and 7x7 convolution filtering, are embedded into the convolutional neural network structure for end-to-end learning. The use of specific image filtering templates in the high-pass filtering pre-processing operation without specifically considering the effects of the image content is not an efficient way. The convolution preprocessing operation is adaptive to different input images, but requires a large amount of data training to achieve better generalization. The rationality of residual learning preprocessing can be analyzed by obtaining a sensor noise model by modeling equipment.

Taking sensor pattern noise as an example, there is strong correlation between the extracted high quality sensor pattern noise and the content of the extracted image. Then, the embodiment of the invention theoretically analyzes the principle, and firstly establishes an image acquisition model as shown in formula (1)

I＝I⁽⁰⁾+γIK+Θ (1)

I denotes the actually acquired image, I⁽⁰⁾Representing the image in an ideal case, K represents the sensor pattern noise, γ represents the image correction parameter, and θ represents the additive noise generated during the image acquisition.

After low pass filtering the real image, calculating the difference between the original image and the low pass filtered image can obtain a noise image W, and then using maximum likelihood estimation, an estimate of the sensor mode noise can be obtained, as shown below,

the variance range for the estimate K of sensor mode noise can be determined from the cramer lower bound, as shown below,

representing the estimated value of the noise of the sensor pattern, K representing the true value of the noise of the sensor pattern, sigma²Representing the variance of the noise of the image, I_kRepresenting the k-th image and N representing the number of images used to estimate the sensor pattern noise.

From the above equation, it can be concluded that the degree of deviation of the estimated value K of the sensor pattern noise has a strong correlation with the image content. An image in which the image is smoothed and the pixel values approach saturation has a better detection effect. Therefore, the specific filtering operation performed on the image in the preprocessing stage has a certain effect on losing part of the valid information. In the present invention, instead of using specific filtering and convolution filtering operations, the use of residual convolution operations at the preprocessing layer allows the network to learn the appropriate preprocessing operation template in the input data, with the formula:

R＝F(I,W_k)+W_sI (4)

F(I,W_k)＝R-W_sI (5)

r residual output, I input, W_k，W_sConvolution kernel parameters

The reason for considering the residual learning preprocessing is as follows: first of all can be realized with

A consistent model representation; secondly, according to a large amount of work in the field of image recognition, residual error learning is shown to be a more effective characteristic representation mode; if VLAD is a kind of feature representation, encoding a corresponding dictionary by residual vectors; the Fisher vector may be represented as a probabilistic version of the VLAD. Both of them are valid feature representations for image retrieval and classification and illustrate that the coded residual vector exhibits better effectiveness. Work has shown that these solutions converge much faster than in the standard case. These schemes show that residual characterization can simplify the optimization problem; third consideration F (I, W)_k) As a basic target map, it is desirable to fit the target map with a network of stacked layers, I representing the input of the first of these layers. The present invention assumes: multiple non-linear layers can approximate a complex function, this sum assuming: it can approximate that the residual function is consistent, with the same dimensionality of the input and output. The invention uses these layers to fit a residual function F (I, W)_k)＝R-W_sI, so that the original function becomes R ═ F (I, W)_k)+W_sI. Although the two forms should be identical, the ease of learning by the network should be different.

The structure diagram of the residual preprocessing convolution fusion network provided by the embodiment of the invention is shown in fig. 3. Firstly, carrying out convolution operation on an image by 64 convolution kernels with the size of 1x1 and step length, and then carrying out block normalization and ReLU activation function operation; then carrying out convolution operation next to 64 convolution kernels with the size of 3x3 and one step, and then carrying out block normalization and ReLU activation function operation; then, carrying out convolution operation on 256 convolution kernels with 1x1 size and one-step length, and then carrying out block normalization and ReLU activation function operation to obtain an output result of F (I, W)_k) (ii) a Carrying out convolution operation on the image by 256 convolution kernels with 1x1 size and one-step length, and then carrying out block normalization and ReLU activation function operation to obtain an output result W_sI; and adding the two output results and performing ReLU activation operation to obtain R ═ F (I, W)_k)+W_sAnd I, residual error learning preprocessing of the image is realized. The result of the residual preprocessing is used as the input of the convolutional neural network. The residual error preprocessing convolution neural network uses a five-layer convolution unit to extract the characteristics of the camera fingerprint, and each convolution unit consists of four parts of operations, namely convolution operation, block normalization operation, ReLU activation function operation and average pooling operation. The number of signatures for the five convolution units is 8, 16, 32, 64,

128 convolution kernel size 5x5, step size 1. It should be noted that in the fifth convolution unit, a global averaging pooling operation is used to reduce the feature image dimension to one dimension. Finally, the features are classified by a full connection layer. The whole process is an end-to-end learning process, and the parameters of the model are updated iteratively by using a random gradient descent method.

The image feature extraction and classification unit replaces a convolution structure in a residual error preprocessing convolution neural network structure, and the invention uses a weight convolution structure to carry out preprocessing on a digital imageAnd extracting the bottom layer characteristic and the correlation characteristic between pixels. A schematic diagram of a digital image device forensics method based on a residual preprocessing convolution fusion network provided by an embodiment of the present invention is shown in fig. 4. In the invention, a new structural unit is introduced from the correlation among the channels of the convolutional neural network, and the weight convolution is carried out, so that the quality of a characteristic diagram generated by a network structure is improved. This process is achieved by explicitly modeling the interdependencies between convolutional layer feature channels. The network structure is allowed to perform feature recalibration to obtain a weight representation of the feature representation, thereby selectively enhancing the valid information and suppressing the invalid information. The weight convolution structure is composed of two parts: a pressing unit and an exciting unit. The pressing unit and the exciting unit may construct any given affine transformation. For simplicity of explanation, the present invention uses U in the notation to denote a convolution operation. V represents the set of learned convolution kernels, V_cRepresenting the parameters of the c-th convolution kernel.

Here the expression convolution is used to denote that,

is a 2-dimensional spatial convolution kernel representation, V_cRepresenting the convolution kernel in one channel that performs the convolution operation on the corresponding X channel. Because the output is obtained by weighted summation of all channels, the dependency relationship between the channels is embedded into V_cThus, the channel correlation modeled by convolution is local in nature. The invention provides the network with the authority to represent the global information and endows the weight authority of the characteristics among the channels. The method for recalibrating the filter response by explicitly modeling the interdependence relationship between channels mainly comprises the following two steps: a pressing unit and an exciting unit. In the compression unit, in order to solve the task of exploring the inter-channel correlation, the compression of global spatial information is considered to obtain a channel descriptor. This process can be implemented using a global average pooling to obtain a channel-by-channel systemAnd (6) metering. The formulation is such that z is the output from U by spatial dimension contraction and for the c-th channel z can be calculated by

z_cRepresents the output of the c channel, U, from U by spatial dimension contraction_cRepresenting the input, H, W represent the height and width of the input.

In order to use the information fused in the compression unit, the compression unit is followed by a second operation, the excitation unit, aiming at fully capturing the channel-by-channel dependency relationship by using the mapping function. To achieve this goal, the mapping function should satisfy two principles:

1. it must be flexible (in particular, there must be a nonlinear relationship that learns the interplay between different channels);

2. it must be possible to learn a non-mutually exclusive relationship because we want to be sure that the inter-channel information can be strengthened, rather than emphasizing one active channel. To satisfy this principle, a simple gate mechanism using sigmoid activation function was chosen, defined as follows

s＝F_ex(z,W)＝δ(g(z,W))＝δ(W₂δ(W₁z)) (8)

s represents the output of the excitation unit, z represents the input, W₁,W₂Representing the convolution kernel parameters.

In the above equation, δ represents the ReLU activation function. To limit the complexity of the model and increase the generalization, we parameterize the gate mechanism by constructing one channel with two non-linear fully-connected layers. For example using a parameter W₁The dimensionality reduction layer has a dimensionality reduction ratio of 16, a ReLu activation function layer and a parameter W₂The ascending dimension layer. The output of the final block may be obtained by an activation function scaling transform, as shown below

Where F is the product of the scale factor s and the feature map u from channel to channel. The excitation unit maps the input z to a set of channel specific weights. In this regard, weight convolution inherently introduces dynamic condition constraints in the input, helping to enhance the discernability of features.

The digital image equipment discrimination result output unit firstly trains a residual preprocessing weight convolution and a traditional convolution fusion network by utilizing training set data to obtain an effective network model for equipment evidence collection; and then, taking the input digital image as the input of a residual preprocessing weight convolution and a traditional convolution fusion network model, and giving a distinguishing result of the digital image acquisition equipment as the output of an equipment evidence obtaining system.

The image is input to a digital image equipment evidence obtaining system of a residual error preprocessing weight convolution and traditional convolution fusion network, an output result can be obtained, and the name of the acquisition equipment corresponding to the digital image can be distinguished.

Example two

In order to effectively illustrate the performance of the present invention, the experimental results are shown and analyzed by using the figures and table data, thereby proving that the present invention has excellent performance.

Experimental data image sources and equipment evidence public image libraries, 9 pieces of equipment are selected from the image libraries, and 9 pieces of equipment with different brands are included. Device list used as shown in fig. 5, the image is cropped into non-overlapping 64x64 size image blocks. The number of training sets, check sets and test sets is 2757888, 689472, respectively. The initial learning rate of the algorithm is set to be 0.01, the learning rate is decreased by 10% per 10000 iterations, the maximum iteration times is set to be 50K times, and the momentum is set to be 0.9.

The comparison of the performance of the device forensics system is shown in fig. 6, and was verified using 9 devices in fig. 5. The experimental result shows that the digital image equipment evidence obtaining system of the proposed residual learning weight convolution and traditional convolution fusion network has the best performance. HP-CNN and CV7-CNN represent the high-pass filter pre-processing convolutional neural network and the convolutional pre-processing convolutional neural network, and as shown in FIG. 5, the discrimination accuracy is 81.34% and 86.01%, respectively. Res-CNN represents the residual learning preprocessing convolution neural network, and the distinguishing accuracy rate is 90.64%. Through the experimental results, the residual error preprocessing mode is shown to greatly improve the distinguishing performance of the system. Under the condition that residual learning is used in the preprocessing, the method verifies the contribution degree of the weight convolution to the system, replaces the convolution one of the Res-CNN model with the weight convolution, and constructs Res-SE-one, namely a digital image equipment evidence obtaining system based on the residual preprocessing weight convolution and the traditional convolution fusion network; and (4) convolving the replacement weights of the first layer and the second layer to construct Res-SE-two, wherein the performances are respectively 96.23% and 95.51%. Experimental results prove that the digital image equipment evidence obtaining system based on the residual error preprocessing weight convolution and the traditional convolution fusion network obtains the best distinguishing performance under the condition of small-size images.

In conclusion, the embodiment of the invention adopts the fusion residual learning preprocessing to the traditional convolutional neural network structure, so that the signal-to-noise ratio of the evidence-obtaining characteristics of the equipment is improved, and the identification performance of the model is obviously improved; and the accuracy of feature extraction and classification is effectively improved by using a weight convolution mode at the bottom layer of the weight convolution and traditional convolution fusion network. By integrating residual learning preprocessing, weight convolution and a traditional convolution fusion network, the method can effectively perform equipment evidence obtaining on the digital image under the condition of the digital image with small resolution, and effectively improves the equipment identification performance of the digital image.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A digital image equipment forensics system based on a residual learning convolution fusion network, comprising: the digital image processing device comprises a digital image input unit, a digital image preprocessing unit, an image feature extraction and classification unit and a digital image equipment identification result output unit;

the digital image equipment identification result output unit is used for obtaining equipment related characteristics according to the high-level semantic characteristics of the digital images transmitted by the image characteristic extraction and classification unit and outputting equipment evidence obtaining results of the digital images according to the equipment related characteristics;

the system of (a), wherein the digital image is a 64x64 resolution digital image;

the system is characterized in that: the process of residual learning filtering processing by the digital image preprocessing unit comprises the following steps: performing convolution operation by 64 convolution kernels with the size of 1x1 by one step, and then performing block normalization and ReLU activation function operation; then carrying out convolution operation next to 64 convolution kernels with the size of 3x3 and one step, and then carrying out block normalization and ReLU activation function operation; then, carrying out convolution operation on 256 convolution kernels with 1x1 size and one-step length, and then carrying out block normalization and ReLU activation function operation to obtain an output result of F (I, W)_k) (ii) a Carrying out convolution operation on the image by 256 convolution kernels with 1x1 size and one-step length, and then carrying out block normalization and ReLU activation function operation to obtain an output result W_sI; and adding the two output results and performing ReLU activation operation to obtain R ═ F (I, W)_k)+W_sI, residual error learning filtering processing of the image is realized;

the system is characterized in that: the image feature extraction and classification unit is used for extracting the bottom layer features and the inter-pixel correlation features of the preprocessed digital image by using a weight convolution and weight convolution structure in a traditional convolution fusion network, inputting the bottom layer features and the inter-pixel correlation features into the traditional convolution structure in the weight convolution and traditional convolution fusion network, and extracting the high-level semantic features of the digital image through the traditional convolution structure, wherein the high-level semantic features are the device correlation features;

the system is characterized in that the weight convolution structure in the image feature extraction and classification unit comprises a squeezing unit and an excitation unit, wherein the squeezing unit and the excitation unit can construct any given affine transformation;

the extrusion unit is used for obtaining a channel descriptor by considering extrusion global spatial information, obtaining channel-by-channel statistics by using a global average pooling mode, and formulating as follows, wherein z is an output obtained by shrinking a spatial dimension through U, and for the c-th channel, z can be calculated by the following formula:

parameter u_c(i, j) represents the feature map of the c-th channel, and the parameters H, W represent the height and width of the feature map, respectively;

the excitation unit is used for capturing the channel-by-channel dependency relationship by using a mapping function, and the excitation unit is defined as follows:

s＝F_ex(z,W)＝δ(g(z,W))＝δ(W₂δ(W₁z)) (8)

in the above formula, δ represents the ReLU activation function;

the output of the image feature extraction and classification unit is obtained by activating a function scaling transformation, as follows:

2. The system of claim 1, wherein:

the digital image equipment identification result output unit is used for obtaining an equipment evidence obtaining result which is a 1 xN-dimensional vector, N is the number of the digital image obtaining equipment, selecting the equipment ID corresponding to the value with the highest score from the 1 xN-dimensional vector, and taking the equipment ID as an output result, namely identifying the digital image to be detected to be obtained by the equipment ID.