EP1971962A2 - Method for identifying marked images based at least in part on frequency domain coefficient differences - Google Patents

Method for identifying marked images based at least in part on frequency domain coefficient differences

Info

Publication number
EP1971962A2
EP1971962A2 EP06718464A EP06718464A EP1971962A2 EP 1971962 A2 EP1971962 A2 EP 1971962A2 EP 06718464 A EP06718464 A EP 06718464A EP 06718464 A EP06718464 A EP 06718464A EP 1971962 A2 EP1971962 A2 EP 1971962A2
Authority
EP
European Patent Office
Prior art keywords
coefficient difference
image
analysis
array
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP06718464A
Other languages
German (de)
French (fr)
Other versions
EP1971962A4 (en
Inventor
Yun-Qing Shi
Chunhua Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Jersey Institute of Technology
Original Assignee
New Jersey Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New Jersey Institute of Technology filed Critical New Jersey Institute of Technology
Publication of EP1971962A2 publication Critical patent/EP1971962A2/en
Publication of EP1971962A4 publication Critical patent/EP1971962A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N1/32144Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/80Recognising image objects characterised by unique random patterns
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N1/32144Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
    • H04N1/32149Methods relating to embedding, encoding, decoding, detection or retrieval operations
    • H04N1/32154Transform domain methods

Definitions

  • This application is related to classifying or identifying content, such as marked images, for example.
  • FIG. 1 is a schematic diagram illustrating one embodiment of a portion of a frequency domain coefficient 2-D array
  • FIG. 2A-D are schematic diagrams illustrating one embodiment of a technique to generate frequency domain coefficient array differences
  • FIGs. 3A and B are plots illustrating the distribution of coefficient array differences for a set of images
  • FIG. 4 is a schematic diagram illustrating an embodiment for forming a one-step transition probability matrix, such as to characterize a Markov process
  • FIG. 5 is a block diagram illustrating one embodiment of generating features.
  • one embodiment described herein includes a method based at least in part on statistical moments derived at least in part from an image 2-D array and a JPEG 2-D array.
  • a first order histogram and/or a second order histogram may be employed, although claimed subject matter is not limited in scope in this respect.
  • higher order histograms may be utilized in other embodiments, for example.
  • moments of 2-D characteristic functions are also used, although, again, other embodiments are not limited in this respect. For example, higher order moments may be employed.
  • OutGuess embeds the to-be-hidden data using redundancy of the cover image.
  • the cover image refers to the content without the hidden data embedded.
  • OutGuess attempts to preserve statistics based at least in part on the BDCT histogram.
  • OutGuess identifies redundant BDCT coefficients and embeds data into these coefficients to reduce effects from data embedding. Furthermore, it adjusts coefficients in which data has not been embedded to attempt to preserve the original BDCT histogram.
  • F5 developed from Jsteg, F3, and F4, employs the following techniques: straddling and matrix coding. Straddling scatters the message as uniformly distributed as possible over a cover image. Matrix coding tends to improve embedding efficiency (defined here as the number of embedded bits per change of the BDCT coefficient). MB embedding tries to make the embedded data correlated to the cover medium.
  • This is implemented by splitting the cover medium into two parts, modeling the parameter of the distribution of the second part given the first part, encoding the second part by using the model and to-be-embedded message, and then combining the two parts to form the stego medium.
  • Cauchy distribution is used to model the JPEG BDCT mode histogram and the embedding attempts to keep the lower precision histogram of the BDCT modes unchanged.
  • Farid A universal steganalysis method using higher order statistics has been proposed by Farid. See H. Farid, "Detecting hidden messages using higher-order statis-tical models", International Conference on Image Processing, Rochester, NY, USA, 2002. (hereinafter, "Farid")
  • Quadrature mirror filters are used to decompose a test image into wavelet subbands.
  • the higher order statistics are calculated from wavelet coefficients of high-frequency subbands to form a group of features.
  • Another group of features is similarly.formulated from the prediction errors of wavelet coefficients of high-frequency subband.
  • this method uses a Markov chain along a horizontal direction and, thus, this approach does not necessarily reflect the 2-D nature of a digital image.
  • JPEG 2-D arrays are formed based at least in part on JPEG quantized block DCT coefficients.
  • difference JPEG 2-D arrays may be formed along horizontal, vertical and diagonal directions for this particular embodiment and a Markov process may be applied to model these difference JPEG 2-D arrays so as to utilize second order statistics for steganalysis.
  • a thresholding technique may be applied to reduce the dimensionality of transition probability matrices, thus making the computational complexity of the scheme more manageable.
  • steganalysis is considered as a task of two-class pattern recognition. That is, a given image may be classified as either a stego image (with hidden data) or as a non-stego image (without hidden data).
  • a JPEG 2-D array is formed.
  • a difference JPEG 2-D array along different directions is formed.
  • a transition probability matrix may be constructed to characterize the Markov process. Features may then be derived from this transition probability matrix.
  • the so- called one-step transition probability matrix is employed here for reduced computational complexity, although claimed subject matter is not limited in scope in this respect. For example, more complex transition probability matrices may be employed in other embodiments.
  • a thresholding technique is also applied, as described in more detail below.
  • features are to be generated from a block DCT representation of an image; however, claimed subject matter is not limited in scope in this respect.
  • other frequency domain representations of an image may be employed. Nonetheless, for this particular embodiment, it is desirable to examine the properties of JPEG BDCT coefficients.
  • this 2-D array For a given image, consider a 2-D array comprising 8x8 block DCT coefficients which have been quantized with a JPEG quantization table, but not zig-zag scanned, run-length coded and Huffman coded. That is, this 2-D array has the same size as the given image with a given 8x8 block filled up with the corresponding JPEG quantized 8x8 block DCT coefficients.
  • this resultant 2-D array is referred to as a JPEG 2-D array.
  • the features for this particular embodiment are to be formed from a JPEG 2-D array.
  • JPEG BDCT quantized coefficients may be either positive, or negative, or zero.
  • BDCT coefficients in general do not obey a Gaussian distribution; however, these coefficients are not statistically independent of each other necessary.
  • the magnitude of the non-zero BDCT coefficients may be correlated along the zigzag scan order, for example.
  • a correlation may exist among absolute values of the BDCT coefficients along horizontal, vertical and diagonal directions.
  • Fig. 3 shown below That is, the difference of the absolute values of two immediately (horizontally in Figure 3) neighboring BDCT coefficients are highly concentrated around 0, having a Laplacian-like distribution.
  • a similar observation may be made along vertical and diagonal directions.
  • this particular embodiment may exploit this aspect of the coefficients, although, of course, claimed subject matter is not limited in scope in this respect.
  • a disturbance introduced by data embedding manifests itself more apparently in a prediction-error image than in an original image.
  • difference arrays may be generated as follows:
  • the distribution of elements of the above- described difference arrays may be Laplacian-like. Most of the difference values are close to zero.
  • an image set comprising 7560 JPEG images with quality factors ranging from 70 to 90 was accumulated.
  • the arithmetic average of the histograms of the horizontal difference JPEG 2-D arrays generated from this JPEG image set and the histogram of the horizontal difference JPEG 2-D array generated from a randomly selected image from this set of images are shown in Figure 3 (a) and (b), respectively. From this figure, most elements in the horizontal difference JPEG 2-D arrays fall into the interval [-T, T] as long as T is large enough.
  • * 91.99% is the mean, meaning that on statistic average 91.99% of the elements of horizontal difference arrays generated from the image set fall into the range [-4, 4].
  • the standard deviation is 2.836%.
  • a difference JPEG 2-D array is characterized by using a Markov random process.
  • a transition probability matrix may be used to characterize the Markov process.
  • a one-step transition probability matrix is employed for this embodiment, as shown in Fig. 4, although claimed subject matter is not limited in scope in this respect.
  • a thresholding technique may also be employed, although claimed subject matter is not limited in scope in this respect.
  • a threshold value here T.
  • T a threshold value
  • those elements in a difference JPEG 2-D array whose value falls into ⁇ - T, -T+1 , ..., -1 , 0, 1 , ... , T- 1 , T ⁇ is considered. If an element has a value either larger than T or smaller than -T, it will be represented by T or -T correspondingly.
  • This procedure results a transition probability matrix of dimensionality (2T+1) ⁇ (2T+1).
  • a threshold level may vary. Nonetheless, for this embodiment, the elements of these four matrixes associated with horizontal, vertical, main diagonal and minor diagonal difference JPEG 2-D arrays are given by:
  • (2T+1) ⁇ (2T+1) elements are obtained for a transition probability matrix.
  • 4 ⁇ (2T+1) ⁇ (2T+1) elements are produced.
  • these may be employed as features for steganalysis.
  • 4 ⁇ (2T+1) ⁇ (2T+1) feature vectors have been produced for steganaysis for this particular embodiment.
  • T in this example is set to 4, although claimed subject matter is not limited in scope in this respect.
  • this element if an element has an absolute value larger than 4, this element is reassigned an absolute value 4 without sign change.
  • Feature construction for this particular embodiment is illustrated by a block diagram shown in Fig. 5.
  • analysis of variance process refers to processes or techniques that may be applied so that differences attributable to statistical variation are sufficiently distinguished from differences attributable to non-statistical variation to correlate, segment, classify, analyze or otherwise characterize data based at least in part on application of such processes or techniques.
  • artificial intelligence techniques and processes including pattern recognition; neutral networks; genetic processes; heuristics; and support vector machines (SVM).
  • SVM may, for example, be employed to handle linear and non-linear cases or situations.
  • an SVM classifier may be applied to search for a hyper-plane that separates a positive pattern from a negative pattern.
  • a support vector machine (SVM) is used as a classifier.
  • SVM is based at least in part on the idea of hyperplane classifier. It uses Lagrangian multipliers to find a separation hyperplane which distinguishes the positive pattern from the negative pattern. If the feature vectors are one-dimensional (1-D), the separation hyperplane reduces to a point on the number axis. SVM can handle both linear separable and no-linear separable cases.
  • SVM support vector machine
  • a selection ⁇ from the data may be classified using w
  • a "learning machine” may map input feature vectors to a higher dimensional space in which a linear hyper-plane may potentially be located.
  • a transformation from nonlinear feature space to linear higher dimensional space may be performed using a kernel function.
  • kernels include: linear, polynomial, radial basis function and sigmoid.
  • a linear kernel may be employed in connection with a linear SVM process, for example.
  • other kernels may be employed in connection with a non-linear SVM process.
  • a polynomial kernel was employed.
  • An image database comprising 7,560 JPEG images with quality factors ranging from 70 to 90 was employed.
  • One third of these images were an essentially random set of pictures taken at different times and places with different digital cameras. The other two thirds were downloaded from the Internet.
  • Each image was cropped (central portion) to the size of either 768x512 or 512x768.
  • chrominance components of the images are set to be zero while luminance coefficients are unaltered before data embedding.
  • one embodiment may comprise one or more articles, such as a
  • This storage media such as, one or more CD-ROMs and/or disks, for example, may have stored thereon instructions, that when executed by a system, such as a computer system, computing platform, or other system, for example, may result in an embodiment of a method in accordance with claimed subject matter being executed, such as
  • a computing platform may include one or more processing units or processors, one or more input/output devices, such as a display, a keyboard and/or a mouse, and/or one or more memories, such as static random access memory, dynamic random access memory, flash memory, and/or a hard drive.
  • a display may be employed to display one or more queries, such as those that may be interrelated, and or one or more tree expressions, although, again, claimed subject matter is not limited in scope to this example.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Analysis (AREA)

Abstract

Briefly, in accordance with one embodiment, a method of identifying marked images based at least in part on frequency domain coefficient differences is disclosed.

Description

METHOD FOR IDENTIFYING MARKED IMAGES BASED AT LEAST IN PART ON FREQUENCY DOMAIN COEFFICIENT DIFFERENCES
FIELD
This application is related to classifying or identifying content, such as marked images, for example.
BACKGROUND
In recent years digital data hiding has become an active research field. Various kinds of data hiding methods have been proposed. Some methods aim at content protection, and/or authentication, while some aim at covert communication. The latter category of data hiding is referred to here as steganography.
BRIEF DESCRIPTION OF THE DRAWINGS
Subject matter is particularly pointed out and distinctly claimed in the concluding portion of the specification. Claimed subject matter, however, both as to organization and method of operation, together with objects, features, and/or advantages thereof, may best be understood by reference of the following detailed description if read with the accompanying drawings in which:
FIG. 1 is a schematic diagram illustrating one embodiment of a portion of a frequency domain coefficient 2-D array;
FIG. 2A-D are schematic diagrams illustrating one embodiment of a technique to generate frequency domain coefficient array differences;
FIGs. 3A and B are plots illustrating the distribution of coefficient array differences for a set of images; FIG. 4 is a schematic diagram illustrating an embodiment for forming a one-step transition probability matrix, such as to characterize a Markov process; and
FIG. 5 is a block diagram illustrating one embodiment of generating features.
DETAILED DESCRIPTION
In the following detailed description, numerous specific details are set forth to provide a thorough understanding of claimed subject matter. However, it will be understood by those skilled in the art that claimed subject matter may be practiced without these specific details. In other instances, well known methods, procedures, components and/or circuits have not been described in detail so as not to obscure claimed subject matter.
Some portions of the detailed description which follow are presented in terms of algorithms and/or symbolic representations of operations on data bits and/or binary digital signals stored within a computing system, such as within a computer and/or computing system memory. These algorithmic descriptions and/or representations are the techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of operations and/or similar processing leading to a desired result. The operations and/or processing may involve physical manipulations of physical quantities. Typically, although not necessarily, these quantities may take the form of electrical and/or magnetic signals capable of being stored, transferred, combined, compared and/or otherwise manipulated. It has proven convenient, at times, principally for reasons of common usage, to refer to these signals as bits, data, values, elements, symbols, characters, terms, numbers, numerals and/or the like. It should be understood, however, that all of these and similar terms are to be associated with appropriate physical quantities and are merely convenient labels. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout this specification discussions utilizing terms such as "processing", "computing", "calculating", "determining" and/or the like refer to the actions and/or processes of a computing platform, such as a computer or a similar electronic computing device, that manipulates and/or transforms data represented as physical electronic and/or magnetic quantities and/or other physical quantities within the computing platform's processors, memories, registers, and/or other information storage, transmission, and/or display devices.
Owing to the popular usage of JPEG images, steganographic tools for
JPEG images emerge increasingly nowadays, among which model based steganography (MB), F5 and OutGuess are the most advanced. However, it continues to be desirable to develop new tools to identify images that include hidden data. In accordance with claimed subject matter, one embodiment described herein includes a method based at least in part on statistical moments derived at least in part from an image 2-D array and a JPEG 2-D array. In this particular embodiment, a first order histogram and/or a second order histogram may be employed, although claimed subject matter is not limited in scope in this respect. For example, higher order histograms may be utilized in other embodiments, for example. However, continuing with this particular embodiment, from these histograms, moments of 2-D characteristic functions are also used, although, again, other embodiments are not limited in this respect. For example, higher order moments may be employed.
The popularity of computer utilization accelerates the wide spread use of the Internet. As a result, millions of pictures flow on the Internet everyday. Nowadays, the interchange of JPEG (Joint Photographic Experts Group) images becomes more and more frequent. Many steganographic techniques operating on JPEG images have been published and have become publicly available. Most of the techniques in this category appear to modify an 8x8 block discrete cosine transform (BDCT) coefficients in the JPEG domain to embed hidden data. Among the steganographic techniques, the recent published schemes, OutGuess , F5, and the model-based steganography (MB) appear to be the most advanced. See, N. Provos, "Defending against statistical steganalysis," 10th USENIX Security Symposium, Washington DC, USA, 2001 ; A. Westfeld, "F5 a steganographic algorithm: High capacity despite better steganalysis," 4th International Workshop on Infor-mation Hiding, Pittsburgh, PA, USA, 2001 ; P. Sallee, "Model-based steganography," International Work-shop on Digital Watermarking, Seoul, Korea, 2003. OutGuess embeds the to-be-hidden data using redundancy of the cover image. In this context, the cover image refers to the content without the hidden data embedded. For JPEG images, OutGuess attempts to preserve statistics based at least in part on the BDCT histogram. To further this,
OutGuess identifies redundant BDCT coefficients and embeds data into these coefficients to reduce effects from data embedding. Furthermore, it adjusts coefficients in which data has not been embedded to attempt to preserve the original BDCT histogram. F5, developed from Jsteg, F3, and F4, employs the following techniques: straddling and matrix coding. Straddling scatters the message as uniformly distributed as possible over a cover image. Matrix coding tends to improve embedding efficiency (defined here as the number of embedded bits per change of the BDCT coefficient). MB embedding tries to make the embedded data correlated to the cover medium. This is implemented by splitting the cover medium into two parts, modeling the parameter of the distribution of the second part given the first part, encoding the second part by using the model and to-be-embedded message, and then combining the two parts to form the stego medium. Specifically, the Cauchy distribution is used to model the JPEG BDCT mode histogram and the embedding attempts to keep the lower precision histogram of the BDCT modes unchanged.
To detect hidden information in a stego image, many steganalysis methods have been proposed. A universal steganalysis method using higher order statistics has been proposed by Farid. See H. Farid, "Detecting hidden messages using higher-order statis-tical models", International Conference on Image Processing, Rochester, NY, USA, 2002. (hereinafter, "Farid") Quadrature mirror filters are used to decompose a test image into wavelet subbands. The higher order statistics are calculated from wavelet coefficients of high-frequency subbands to form a group of features. Another group of features is similarly.formulated from the prediction errors of wavelet coefficients of high-frequency subband. In Y. Q. Shi, G. Xuan, D. Zou, J. Gao, C. Yang, Z. Zhang, P. Chai, W. Chen, C. Chen, "Steganalysis based on moments of characteristic functions using wavelet decomposition, prediction- error image, and neural network," International Conference on Multimedia and Expo, Amsterdam, Netherlands, 2005, (hereinafter, "Shi et al.), a described method employs statistical moments of characteristic functions of a test image, its prediction-error image, and their discrete wavelet transform (DWT) subbands as features.
However, steganalysis method specifically designed for addressing JPEG steganographic schemes has been proposed by Fridrich. See J. Fridrich, "Feature-based steganalysis for JPEG images and its implications for future design of steganographic schemes," 6th Information Hiding Workshop, Toronto, ON, Canada, 2004. With a relatively small-size set of well-selected features, this method outperforms other steganalysis methods, such as those previously mentioned, when detecting images that have hidden data created by OutGuess, F5 and MB. See M. Kharrazi, H. T. Senear, N. D. Memon, "Benchmarking steganographic and steganalysis techniques", Security, Steganography, and Watermarking of Multimedia Contents 2005, San Jose, CA, USA, 2005.
Recently, a scheme was developed to detect data hidden with a spread spectrum method, in which the inter-pixel dependencies are used and a Markov chain model is adopted. See K. Sullivan, U. Madhow, S. Chandrasekaran, and B. S. Manjunath, "Steganalysis of Spread Spectrum Data Hiding Exploiting Cover Memory", the International Society for Optical Engineering, Electronic Imaging, San Jose, CA, USA, 2005. In this approach, an empirical transition matrix of a given test image is formed. This matrix has a dimensionality of 256x256 for a grayscale image with a bit depth of 8. That is, this matrix has 65,536 elements. These large numbers of elements make using all of the elements as features challenging. The authors therefore selected several of the largest probabilities of the matrix along the main diagonal together with their neighbors, and some other randomly selected probabilities along the main diagonal, as features. Of course, some information loss is inevitable due to this feature selection process.
Furthermore, this method uses a Markov chain along a horizontal direction and, thus, this approach does not necessarily reflect the 2-D nature of a digital image.
Identifying JPEG images in which data has been hidden from JPEG images that do not contain hidden data continues to be desirable. One embodiment in accordance with claimed subject matter involves employing JPEG 2-D arrays. In thois particular embodiment, a JPEG 2-D array is formed based at least in part on JPEG quantized block DCT coefficients. Likewise, difference JPEG 2-D arrays may be formed along horizontal, vertical and diagonal directions for this particular embodiment and a Markov process may be applied to model these difference JPEG 2-D arrays so as to utilize second order statistics for steganalysis. In addition to the utilization of difference JPEG 2-D arrays, a thresholding technique may be applied to reduce the dimensionality of transition probability matrices, thus making the computational complexity of the scheme more manageable.
For this particular embodiment, steganalysis is considered as a task of two-class pattern recognition. That is, a given image may be classified as either a stego image (with hidden data) or as a non-stego image (without hidden data).
As mentioned previously, modern steganorgraphic methods, such as OutGuess and MB, have made great efforts to keep the changes of BDCT coefficients from data hiding relatively small and therefore more difficult to detect. In particular, they attempt to keep changes on the histogram of JPEG coefficients relatively small. Under these circumstances, therefore, as is employed in this embodiment, higher order statistics as features for steganalysis may be desirable. Here, in particular, for this embodiment, second order statistics are employed, however, claimed subject matter is not limited in scope in this respect.
For this embodiment, a JPEG 2-D array is formed. Likewise, a difference JPEG 2-D array along different directions is formed. To model the difference JPEG 2-D array using Markov random process, a transition probability matrix may be constructed to characterize the Markov process. Features may then be derived from this transition probability matrix. The so- called one-step transition probability matrix is employed here for reduced computational complexity, although claimed subject matter is not limited in scope in this respect. For example, more complex transition probability matrices may be employed in other embodiments. To further reduce computational complexity, a thresholding technique is also applied, as described in more detail below.
For this embodiment, features are to be generated from a block DCT representation of an image; however, claimed subject matter is not limited in scope in this respect. For example, in alternate embodiments, other frequency domain representations of an image may be employed. Nonetheless, for this particular embodiment, it is desirable to examine the properties of JPEG BDCT coefficients.
For a given image, consider a 2-D array comprising 8x8 block DCT coefficients which have been quantized with a JPEG quantization table, but not zig-zag scanned, run-length coded and Huffman coded. That is, this 2-D array has the same size as the given image with a given 8x8 block filled up with the corresponding JPEG quantized 8x8 block DCT coefficients. Next, apply an absolute value to the DCT coefficients, resulting in a 2-D array as shown in Fig. 1. For this embodiment, this resultant 2-D array is referred to as a JPEG 2-D array. As described in more detail below, the features for this particular embodiment are to be formed from a JPEG 2-D array. Without the application of an absolute value operation, JPEG BDCT quantized coefficients may be either positive, or negative, or zero. BDCT coefficients in general do not obey a Gaussian distribution; however, these coefficients are not statistically independent of each other necessary. The magnitude of the non-zero BDCT coefficients may be correlated along the zigzag scan order, for example. Hence, a correlation may exist among absolute values of the BDCT coefficients along horizontal, vertical and diagonal directions. This observation can be further justified by observing Fig. 3 shown below. That is, the difference of the absolute values of two immediately (horizontally in Figure 3) neighboring BDCT coefficients are highly concentrated around 0, having a Laplacian-like distribution. A similar observation may be made along vertical and diagonal directions. Thus, as described below, this particular embodiment may exploit this aspect of the coefficients, although, of course, claimed subject matter is not limited in scope in this respect.
A disturbance introduced by data embedding manifests itself more apparently in a prediction-error image than in an original image. Hence, it is desirable to observe differences between an element and one of its neighbors in a JPEG 2-D array. Therefore, in this particular embodiment, the following four difference JPEG 2-D arrays may be employed, although claimed subject matter is not limited in scope in this respect.
Denote a JPEG 2-D array generated from a given image by ^ ' ' (u E [1,SJ9V e [1,SJ) ^ wherθ O11 js the sjze of g JpEG 2 D array jn
horizontal direction and v in vertical direction. Then as shown in Fig. 2, difference arrays may be generated as follows:
FA(tt,v) = F(tt,v) - F(« + l,v) (1) Fv (u9 v) = F(u, v) - F(u, v + 1) <2>
Fd (U9 v) = F(u, v) - F(u + I9 v + 1) (3)
Fmd (u9 v) = F(u + 1, v) - F(u, v + 1) (4)
where L ' " J L v J and Fh(u,v),Fv(u,v),Fd(u,v),Fmd(u,v) όemχe djfference arpays jn the horizontal, vertical, main diagonal, and minor diagonal directions, respectively.
As suggested previously, the distribution of elements of the above- described difference arrays may be Laplacian-like. Most of the difference values are close to zero. For evaluation purposes, an image set comprising 7560 JPEG images with quality factors ranging from 70 to 90 was accumulated. The arithmetic average of the histograms of the horizontal difference JPEG 2-D arrays generated from this JPEG image set and the histogram of the horizontal difference JPEG 2-D array generated from a randomly selected image from this set of images are shown in Figure 3 (a) and (b), respectively. From this figure, most elements in the horizontal difference JPEG 2-D arrays fall into the interval [-T, T] as long as T is large enough. Values of mean and variance of percentage of elements of horizontal difference JPEG 2-D arrays for the image set falling into [-T, T] if T = {1 , 2, 3, 4, 5, 6, 7} are shown in Table 1. Both Figure 3 and Table 1 tend to support the view that thee distribution of the elements of the horizontal difference JPEG 2- D arrays is Laplacian-like. Similar observations may be made for difference JPEG 2-D array along other directions, such as vertical and diagonal Table 1
[-1 , 1] [-2, 2] [-3, 3] [-4, 4] (*) [-5, 5] [-6, 6] [-7, 7] Mean 84.72 88.58 90.66 91.99 92.92 93.60 94.12
Standard
5.657 4.243 3.464 2.836 2.421 2.104 1.850 deviation
* 91.99% is the mean, meaning that on statistic average 91.99% of the elements of horizontal difference arrays generated from the image set fall into the range [-4, 4]. The standard deviation is 2.836%.
As mentioned before, modern steganographic methods, such as OutGuess and MB, have made efforts to keep changes to the histogram of JPEG BDCT coefficients relatively small from data embedding. Therefore, higher order statistics for steganalyzing JPEG steganography may be useful. In this embodiment, second order statistics are used so as not to signficantly increase computational complexity, although depending upon the embodiment and application, use of statistics higher than second order may be desirable.
In this embodiment, a difference JPEG 2-D array is characterized by using a Markov random process. In particular, a transition probability matrix may be used to characterize the Markov process. There are so-called one- step transition probability matrix and n-step transition probability matrix. Roughly speaking, the former refers to the transition probabilities between two immediately neighboring elements in a difference JPEG 2-D array while the latter refers to the transition probabilities between two elements separated by (n-1) elements. For a balance between steganalysis capability and manageable computational complexity, a one-step transition probability matrix is employed for this embodiment, as shown in Fig. 4, although claimed subject matter is not limited in scope in this respect. To further reduce computational complexity, a thresholding technique may also be employed, although claimed subject matter is not limited in scope in this respect. In this embodiment, a threshold value, here T, is employed. Thus, those elements in a difference JPEG 2-D array whose value falls into {- T, -T+1 , ..., -1 , 0, 1 , ... , T- 1 , T} is considered. If an element has a value either larger than T or smaller than -T, it will be represented by T or -T correspondingly. This procedure results a transition probability matrix of dimensionality (2T+1)χ(2T+1). Of course, again, claimed subject matter is not limited in scope to employing thresholding or to these particular thresholding details. For example, in other embodiments, a threshold level may vary. Nonetheless, for this embodiment, the elements of these four matrixes associated with horizontal, vertical, main diagonal and minor diagonal difference JPEG 2-D arrays are given by:
p{F(u + 1, v)
p{F(u, v + 1)
p{F(u + 1, v + Y) = n I F(u,v) = m} = _ v=l »=1
(7)
p{F(u, v + l)
where m e {-T,-T + l,-,0,- ,T},n e {-T,-T + l,-,0,- ,T} and
£(F(M, V) = m, F(u, v + ϊ) = n) = (9)
In summary, for this embodiment, (2T+1)χ(2T+1) elements are obtained for a transition probability matrix. Thus, 4χ(2T+1)χ(2T+1) elements are produced. Likewise, these may be employed as features for steganalysis. In other words, 4χ(2T+1)χ(2T+1) feature vectors have been produced for steganaysis for this particular embodiment.
From data shown in Figure 3 and Table 1 , T in this example is set to 4, although claimed subject matter is not limited in scope in this respect. Hence, for this embodiment, if an element has an absolute value larger than 4, this element is reassigned an absolute value 4 without sign change. The resultant transition probability matrix is a 9x9 matrix for a difference JPEG 2-D array. That is, 9χ9 = 81 elements per transition probability matrix, or equivalents, 81 χ4 = 324 elements for this particular embodiment. Feature construction for this particular embodiment is illustrated by a block diagram shown in Fig. 5.
A variety of techniques are available to analyze data, here referred to as features, in a variety of contexts. In this context, we use the term "analysis of variance process" to refer to processes or techniques that may be applied so that differences attributable to statistical variation are sufficiently distinguished from differences attributable to non-statistical variation to correlate, segment, classify, analyze or otherwise characterize data based at least in part on application of such processes or techniques. Examples, without intending to limit the scope of claimed subject matter includes: artificial intelligence techniques and processes, including pattern recognition; neutral networks; genetic processes; heuristics; and support vector machines (SVM).
Although claimed subject matter is not limited in scope to SVM or SVM processes, it may be a convenient approach for two-class classification. See, for example, C. Cortes and V.Vapnik, "Support-vector networks," in Machine Learning, 20, 273-297, Kluwer Academic Publishers, 1995. SVM may, for example, be employed to handle linear and non-linear cases or situations. For linearly separable cases, for example, an SVM classifier may be applied to search for a hyper-plane that separates a positive pattern from a negative pattern.
Thus, while Shi et al., for example, employed neural networks, for this embodiment a support vector machine (SVM) is used as a classifier. SVM is based at least in part on the idea of hyperplane classifier. It uses Lagrangian multipliers to find a separation hyperplane which distinguishes the positive pattern from the negative pattern. If the feature vectors are one-dimensional (1-D), the separation hyperplane reduces to a point on the number axis. SVM can handle both linear separable and no-linear separable cases. Here,
training data pairs are denoted by ^" '' ' ~ '' "' , where *' is the
feature vector, ^ is the dimensionality of the feature vectors, and ' ~~ for positive/negative pattern class. In this context, an image with hidden data (stego-image) is considered as a positive pattern while an image without hidden data is considered as a negative pattern, although claimed subject matter is not limited in scope in this respect. A linear support vector approach
, , * u i H : wτy + b = 0 . . . . looks for a hyperplane J and two hyperplanes
π i l W y and 2 ^ parallel to and with substantially equal distances to H with the condition that there are no data points
TJ TT TJ TT between l and 2 and so that the distance between l and 2 cannot feasibly be increase, where and ° are the parameters. Once the SVM has been trained, a selection ^ from the data may be classified using w
and b .
For a non-linearly separable case, a "learning machine" may map input feature vectors to a higher dimensional space in which a linear hyper-plane may potentially be located. In this embodiment, a transformation from nonlinear feature space to linear higher dimensional space may be performed using a kernel function. Examples of kernels include: linear, polynomial, radial basis function and sigmoid. A linear kernel may be employed in connection with a linear SVM process, for example. Likewise, other kernels may be employed in connection with a non-linear SVM process. For this embodiment, a polynomial kernel was employed.
Having formulated an embodiment system for identifying or classifying marked content, such as images, for example, it is desirable to construct and evaluate performance. However, again, we note that this is merely a particular embodiment for purposes of illustration and claimed subject matter is not limited in scope to this particular embodiment or approach.
An image database comprising 7,560 JPEG images with quality factors ranging from 70 to 90 was employed. One third of these images were an essentially random set of pictures taken at different times and places with different digital cameras. The other two thirds were downloaded from the Internet. Each image was cropped (central portion) to the size of either 768x512 or 512x768. Likewise, for purposes of evaluation, chrominance components of the images are set to be zero while luminance coefficients are unaltered before data embedding.
This performance evaluation is focused on detecting Outguess, F5, and MB1 steganography. The codes for these three approaches are publicly available. See http://www.outquess.org/; http://wwwrn.inf.tu- dresden.de/~westfeld/f5.html;http://redwood.ucdavis.edu/phil/papers/iwdw03. htmSince there are quite a few zero BDCT coefficients in the JPEG images and the quantity of zero coefficients varies, the data embedding capacity differs from image to image. A common practice is to use the ratio between the length of hidden data and the number of non-zero BDCT AC coefficients as the measure of data embedding capacity for JPEG images. For OutGuess, 0.05, 0.1 , and 0.2 bpc (bits per non-zero BDCT AC coefficient) were embedded. The resultant numbers of stego image were 7498, 7452, and 7215, respectively. For F5 and MB1, 0.05, 0.1, 0.2, and 0.4 bpc were embedded, which provides 7560 stego images. Note that the step size of MB1 embedding equals to two for this evalution.
One half of the images (and the associated stego image) were randomly seleced to train the SVM classifier and the remaining pairs were employed to evaluate the trained classifier. Approaches previously discussed, such as Farid's, Shi et al.'s, Fridrich's, as well the previously described embodiment were applied to evalution detection of OutGess, F5 and MB schemes. The results shown in Table 2 are the arithmetic average of 20 random experiments. Likewise, as mentioned previously, a polynomial kernel was employed. Unit here are %; TN stands for true negative rate, TP stands for true positive rate, and AR stands for accuracy.
Table 2
Likewise, to examine contributions made by features along different directions evaluation, reduced dimensionality of features was implemented. Hence, features from one direction at a time was implemented. The results shown in Table 4 are the arithmetic average of 20 random experiments with polynomial kernel.
Table 3
Comparing tables, it appears that combining directions enhances detection rate. 5
It will, of course, be understood that, although particular embodiments have just been described, the claimed subject matter is not limited in scope to a particular embodiment or implementation. For example, one embodiment may be in hardware, such as implemented to operate on a device or
10 combination of devices, for example, whereas another embodiment may be in software. Likewise, an embodiment may be implemented in firmware, or as any combination of hardware, software, and/or firmware, for example. Likewise, although claimed subject matter is not limited in scope in this respect, one embodiment may comprise one or more articles, such as a
15 storage medium or storage media. This storage media, such as, one or more CD-ROMs and/or disks, for example, may have stored thereon instructions, that when executed by a system, such as a computer system, computing platform, or other system, for example, may result in an embodiment of a method in accordance with claimed subject matter being executed, such as
20 one of the embodiments previously described, for example. As one potential example, a computing platform may include one or more processing units or processors, one or more input/output devices, such as a display, a keyboard and/or a mouse, and/or one or more memories, such as static random access memory, dynamic random access memory, flash memory, and/or a hard drive. For example, a display may be employed to display one or more queries, such as those that may be interrelated, and or one or more tree expressions, although, again, claimed subject matter is not limited in scope to this example.
In the preceding description, various aspects of claimed subject matter have been described. For purposes of explanation, specific numbers, systems and/or configurations were set forth to provide a thorough understanding of claimed subject matter. However, it should be apparent to one skilled in the art having the benefit of this disclosure that claimed subject matter may be practiced without the specific details. In other instances, well known features were omitted and/or simplified so as not to obscure the claimed subject matter. While certain features have been illustrated and/or described herein, many modifications, substitutions, changes and/or equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and/or changes as fall within the true spirit of claimed subject matter.

Claims

Claims:
1. A method of training an image classification process comprising: forming at least one coefficient difference array based at least in part a frequency domain representation of the image; thresholding the at least one coefficient difference array; and training an analysis of variance process using the thresholded at least one coefficient difference array.
2. The method of claim 1 wherein said at least one different array is in at least one of the following directions: vertical, horizontal or diagonal.
3. The method of claim 1 , wherein said analysis of variance process comprises an SVM process.
4. The method of claim 1 , wherein said at least one coefficient difference array comprises at least three coefficient difference arrays.
5. The method of claim 4, wherein said at least three coefficient difference arrays comprise a horizontal coefficient difference array, a vertical coefficient difference array and a diagonal coefficient difference array.
6. The method of claim 1 , wherein said thresholding comprises nonuniform thresholding.
7. The method of claim 1 , wherein said frequency domain representation of the image comprises a block DCT representation.
8. A method of classifying an image comprising: applying a trained analysis of variance process to said image; and classifying said image based at least in part on a value obtained from application of the trained analysis of variance process.
9. The method of claim 8, wherein said trained analysis of variance process comprises a trained SVM process.
10. The method of claim 9, wherein said trained SVM process is based at least in part on coefficient difference arrays.
11. The method of claim 10, wherein said trained SVM process is based at least in part on thresholded coefficient difference arrays.
12. An article comprising: a storage medium having stored thereon instructions that if executed result in performance of a method of training an image classification process as follows: forming at least one coefficient difference array based at least in part a frequency domain representation of the image; thresholding the at least one coefficient difference array; and training an analysis of variance process using the thresholded at least one coefficient difference array.
13. The article of claim 12, wherein said instructions, if executed, further result in said at least one different array being in at least one of the following directions: vertical, horizontal or diagonal.
14. The article of claim 12, wherein said instructions if executed further result in said analysis of variance process comprising an SVM process.
15. The article of claim 12, wherein said instructions if executed further result in said at least one coefficient difference array comprises at least three coefficient difference arrays.
16. The article of claim 15, wherein said instructions if executed further result in said at least three coefficient difference arrays comprise a horizontal coefficient difference array, a vertical coefficient difference array and a diagonal coefficient difference array.
17. The article of claim 11 , wherein said instructions if executed further result in said thresholding comprising non-uniform thresholding.
18. An article comprising: a storage medium having stored thereon instructions that if executed results in performance of a method of classifying an image comprising: applying a trained analysis of variance process to said image; and classifying said image based at least in part on a value obtained from application of the trained analysis of variance process.
19. The article of claim 18, wherein said instructions if executed further result in said trained analysis of variance process comprising a trained SVM process.
20. The article of claim 24, wherein said instructions if executed further resulting in said trained SVM process being based at least in part on coefficient difference arrays.
21. The article of claim 20, wherein said instructions if executed further resulting in said trained SVM process being based at least in part on thresholded coefficient difference arrays.
22. An apparatus comprising: means for forming at least one coefficient difference array based at least in part a frequency domain representation of the image; means for thresholding the at least one coefficient difference array; and means for training an analysis of variance process using the thresholded at least one coefficient difference array.
23. The apparatus of claim 26, wherein said means for training an analysis of variance process comprises means for training an SVM process.
24. The apparatus of claim 22, wherein said means for said thresholding comprises means for non-uniform thresholding.
25. An apparatus comprising: means for applying a trained analysis of variance process to an image; and means for classifying said image based at least in part on the value obtained from application of the trained analysis of variance process.
26. The apparatus of claim 25, wherein said means for applying a trained analysis of variance process comprises means for applying a trained SVM process.
27. The apparatus of claim 26, wherein said means for applying a trained SVM process is based at least in part on thresholded coefficient difference arrays.
EP06718464A 2006-01-13 2006-01-13 Method for identifying marked images based at least in part on frequency domain coefficient differences Withdrawn EP1971962A4 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2006/001393 WO2007086833A2 (en) 2006-01-13 2006-01-13 Method for identifying marked images based at least in part on frequency domain coefficient differences

Publications (2)

Publication Number Publication Date
EP1971962A2 true EP1971962A2 (en) 2008-09-24
EP1971962A4 EP1971962A4 (en) 2011-04-06

Family

ID=38309629

Family Applications (1)

Application Number Title Priority Date Filing Date
EP06718464A Withdrawn EP1971962A4 (en) 2006-01-13 2006-01-13 Method for identifying marked images based at least in part on frequency domain coefficient differences

Country Status (3)

Country Link
EP (1) EP1971962A4 (en)
JP (1) JP4920045B2 (en)
WO (1) WO2007086833A2 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR0132894B1 (en) * 1992-03-13 1998-10-01 강진구 Image compression coding and decoding method and apparatus
US20020183984A1 (en) * 2001-06-05 2002-12-05 Yining Deng Modular intelligent multimedia analysis system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FARID H ET AL: "How Realistic is Photorealistic?", IEEE TRANSACTIONS ON SIGNAL PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 53, no. 2, 1 February 2005 (2005-02-01), pages 845-850, XP011125221, ISSN: 1053-587X, DOI: DOI:10.1109/TSP.2004.839896 *
FARID H: "Detecting steganographic messages in digital images", 20010101, 1 January 2001 (2001-01-01), pages 1-9, XP002468134, *
GUO-SHIANG LIN ET AL: "Data hiding domain classification for blind image steganalysis", 2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME): JUNE 27 - 30, 2004, TAIPEI, TAIWAN, PISCATAWAY, NJ : IEEE OPERATIONS CENTER, US, vol. 2, 27 June 2004 (2004-06-27), pages 907-910, XP010770967, DOI: DOI:10.1109/ICME.2004.1394348 ISBN: 978-0-7803-8603-7 *
See also references of WO2007086833A2 *

Also Published As

Publication number Publication date
WO2007086833A2 (en) 2007-08-02
EP1971962A4 (en) 2011-04-06
JP2009527930A (en) 2009-07-30
JP4920045B2 (en) 2012-04-18
WO2007086833A3 (en) 2009-05-28

Similar Documents

Publication Publication Date Title
US7925080B2 (en) Method for identifying marked images based at least in part on frequency domain coefficient differences
US8103054B2 (en) Method for identifying marked images using statistical moments based at least in part on a JPEG array
Zhang et al. Robust invisible video watermarking with attention
Shi et al. A Markov process based approach to effective attacking JPEG steganography
Zou et al. Steganalysis based on Markov model of thresholded prediction-error image
Fridrich et al. Practical steganalysis of digital images: state of the art
Luo et al. DVMark: a deep multiscale framework for video watermarking
Fu et al. JPEG steganalysis using empirical transition matrix in block DCT domain
US8224017B2 (en) Method for identifying marked content
WO2008089377A2 (en) A method and apparatus for steganalysis for texture images
Chen et al. Statistical moments based universal steganalysis using JPEG 2-D array and 2-D characteristic function
Tzschoppe et al. Steganographic system based on higher-order statistics
Shankar et al. Result Analysis of Cross-Validation on low embedding Feature-based Blind Steganalysis of 25 percent on JPEG images using SVM
Lafferty et al. Texture-based steganalysis: results for color images
Abolghasemi et al. LSB data hiding detection based on gray level co-occurrence matrix (GLCM)
Malik Steganalysis of qim steganography using irregularity measure
WO2007086833A2 (en) Method for identifying marked images based at least in part on frequency domain coefficient differences
Lakshmi A novel steganalytic algorithm based on III level DWT with energy as feature
JP4610653B2 (en) Method for identifying marked images using statistical moments based at least in part on JPEG sequences
Malik et al. Nonparametric steganalysis of QIM data hiding using approximate entropy
Quan JPEG Steganalysis Based on Local Dimension Estimation
Xu et al. Passive steganalysis using image quality metrics and multi-class support vector machine
Nesakumari et al. Integrated Normalized Content System for Efficient Watermarking
Davidson et al. Double-compressed JPEG detection in a steganalysis system
Sadr et al. Robustness enhancement of content-based watermarks using entropy masking effect

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20080717

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK YU

R17D Deferred search report published (corrected)

Effective date: 20090528

A4 Supplementary search report drawn up and despatched

Effective date: 20110303

17Q First examination report despatched

Effective date: 20110316

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

DAX Request for extension of the european patent (deleted)
18D Application deemed to be withdrawn

Effective date: 20120221