WO2007081344A1 - Method for identifying marked content - Google Patents

Method for identifying marked content Download PDF

Info

Publication number
WO2007081344A1
WO2007081344A1 PCT/US2006/001338 US2006001338W WO2007081344A1 WO 2007081344 A1 WO2007081344 A1 WO 2007081344A1 US 2006001338 W US2006001338 W US 2006001338W WO 2007081344 A1 WO2007081344 A1 WO 2007081344A1
Authority
WO
WIPO (PCT)
Prior art keywords
prediction error
images
content
image
analysis
Prior art date
Application number
PCT/US2006/001338
Other languages
French (fr)
Inventor
Dekun Zou
Yun-Qing Shi
Original Assignee
New Jersey Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New Jersey Institute Of Technology filed Critical New Jersey Institute Of Technology
Priority to JP2008550282A priority Critical patent/JP2009524078A/en
Priority to PCT/US2006/001338 priority patent/WO2007081344A1/en
Priority to EP06718417A priority patent/EP1971961A4/en
Publication of WO2007081344A1 publication Critical patent/WO2007081344A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking

Definitions

  • This application is related to classifying or identifying content, such as marked content, for example.
  • FIGS. 1A-C are schematic diagrams illustrating one embodiment of a predication error model as applied to content, such as an image.
  • Fridrich et al. have shown that the number of zeros in a block DCT domain of a stego-image will increase if the F5 embedding method is applied to generate the stego-image. This feature may be used to determine whether hidden messages have been embedded with the F5 method in content, for example.
  • Fridrich et al. have shown that the number of zeros in a block DCT domain of a stego-image will increase if the F5 embedding method is applied to generate the stego-image. This feature may be used to determine whether hidden messages have been embedded with the F5 method in content, for example.
  • There are other findings regarding steganalysis of particularly targeted data hiding methods See, for example, J. Fridrich, M. Goljan and R. Du, "Detecting LSB steganography in color and gray-scale images", Magazine of IEEE Multimedia Special Issue on Security, Oct.-Nov. 2001 , pp. 22-28; and R.Chandramouli and N.Memon, "
  • Lyu and Farid proposed a more general steganalysis method based at least in part on image high order statistics, derived from image decomposition with separable quadrature mirror filters.
  • image high order statistics derived from image decomposition with separable quadrature mirror filters.
  • the wavelet high- frequency subbands' high order statistics are extracted as features for steganalysis in this approach.
  • this approach has been shown differentiate stego-images from cover images with a certain success rate.
  • Data hiding methods addressed by this particular steganalysis primarily comprise least significant bit-plane (LSB) modification type steganographic tools.
  • LSB bit-plane
  • SS spread spectrum
  • a steganalysis system based at least in part on a 2-D Markov chain of thresholded prediction-error sets for content, such as images, for example, is described below, although claimed subject matter is not limited in scope in this respect.
  • content samples such as, for example, image pixels
  • a prediction-error image for example, is generated by subtracting the prediction value from the pixel value and thresholding.
  • Empirical transition matrixes along the horizontal, vertical and diagonal directions of Markov chains may, in such an embodiment serve as features for steganalysis.
  • a steganalysis system based at least in part on a Markov chain model of thresholded prediction-error images may be applied. Image pixels are predicted with the neighboring pixels. Prediction error in this particular embodiment is obtained by subtracting the prediction values from the pixel value. Though the range of the difference values is increased, the majority of the difference values may be concentrated in a relatively small range near zero owing to a correlation between neighboring pixels in unmarked images.
  • marked content refers to content in which data has been hidden so that it is not apparent that the content contains such hidden information.
  • unmarked or cover content refers to content in which data has not been hidden.
  • Large values in a prediction-error image may be attributed at least in part to image content rather than data hiding. Therefore, a threshold applied to prediction error may reduce or remove large values in the prediction error images, thus limiting the dynamic range of a prediction-error image.
  • prediction-error images may be modeled using a Markov chain.
  • An empirical transition matrix is calculated and serves as features for steganalysis. Owing at least in part to thresholding, the size of empirical transition matrixes is decreased to a manageable size for classifiers so that probabilities in the matrixes may be included in feature vectors.
  • an analysis of variance or other statistical approach may be applied. For example, an SVM process may be applied with both linear and non-linear kernels used for classification, as described in more detail below.
  • analysis of variance process refers to a process in which differences attributable to statistical variation are sufficiently distinguished from differences attributable to non-statistical variation that correlation, segmentation, analysis, classification or other characterization of the data based at least in part on such a process may be performed.
  • steganalysis may have a variety of meanings, for the purpose of this particular embodiment, it refers to a two-class pattern classification approach. For example, a test image may be classified as either a cover image, namely, information is not hidden in it, or a stego-image or marked image, which carries hidden data or hidden messages.
  • the classification comprises two parts, although claimed subject matter is not limited in scope to employing only two classifications.
  • the feature may represent the shape and color of an object.
  • other properties may provide useful information.
  • steganalysis for example, it is desirable to have a feature contain information about changes incurred by data hiding as opposed to information about the content of the image.
  • unmarked images may tend to exhibit particular properties, such as continuous, smooth, and having a correlation between neighboring pixels.
  • hidden data may be independent of the content itself.
  • a watermarking process for example, may change continuity with respect to the unmarked content because it may introduce some amount of random variation, for example. As a result, it may reduce correlation among adjacent pixels, bit-planes and image blocks.
  • this potential variation that may be attributed to data hiding is amplified. This may be accomplished by anyone of a number of possible approaches and claimed subject matter is not limited in scope to a particular approach. However, below, one particular embodiment for accomplishing this is described.
  • neighboring pixels may be used to predict the current pixel.
  • the predictions may be made in three directions. Again, for this embodiment, these directions include horizontal, vertical and diagonal, although in other embodiments other directions are possible.
  • prediction error may be estimated or obtained by subtracting a predicted pixel value from a original pixel value as shown in (1),
  • e h (i, j) indicates prediction error for pixel (i, j) along a horizontal direction
  • e v (i, j) indicates prediction error for pixel (i, j) along a vertical direction
  • ⁇ d (i, j) indicates prediction error for pixel (i, j) along a diagonal direction, respectively.
  • a threshold T may be adopted the prediction errors may be adjusted according to the following rule:
  • T may not comprise a fixed value. For example, it may vary with time, location, and a host of other potential factors.
  • large prediction errors may be treated as 0.
  • image pixels or other content samples may be regarded as smooth from the data hiding point of view.
  • the value range of a prediction-error image is [-T, T], with 2*T+1 possible values.
  • FIG. 1A is a schematic diagram illustrating an embodiment of transition model for horizontal prediction-error image E h , in which a Markov chain is modeled along the horizontal direction, for example.
  • FIG. 1 B and FIG. 1C are schematic diagrams illustrating corresponding embodiments for E v and E d, respectively.
  • elements of the empirical transition matrices for E h , E v and E d in this embodiment are employed as features.
  • one circle represents one pixel.
  • the diagrams show an image of size 8 by 8.
  • the arrows represent the state change in a Markov chain.
  • analysis of variance process refers to processes or techniques that may be applied so that differences attributable to statistical variation are sufficiently distinguished from differences attributable to non-statistical variation to correlate, segment, classify, analyze or otherwise characterize the data based at least in part on application of such processes or techniques.
  • examples without intending to limit the scope of claimed subject matter includes: artificial intelligence techniques and processes; neutral networks; genetic processes; heuristics; and support vector machines (SVM).
  • SVM may, for example, be employed to handle linear and non-linear cases or situations.
  • linear support vector processes may be formulated as follows. If a separating hyper-plane exists, training data satisfies the following constraints:
  • a Lagrangian formulation may likewise be constructed as follows:
  • CX 1 is the positive Lagrange multiplier introduced for inequality constraints, here (3) & (4).
  • the gradient of L with respect to w and b provides:
  • a sample z from testing data may be classified using w and b. For example, in one embodiment, if w'z + b is greater than or equal to zero, the image may be classified as having a hidden message. Otherwise, it may be classified as not containing a hidden message.
  • w'z + b is greater than or equal to zero, the image may be classified as having a hidden message. Otherwise, it may be classified as not containing a hidden message.
  • a "learning machine” may map input feature vectors to a higher dimensional space in which a linear hyper-plane may potentially be located.
  • a transformation from nonlinear feature space to linear higher dimensional space may be performed using a kernel function.
  • kernels include: linear, polynomial, radial basis function and sigmoid.
  • a linear kernel may be employed in connection with a linear SVM process, for example.
  • kernels may be employed in connection with a non-linear SVM process.
  • identifying or classifying marked content such as images, for example, it is desirable to construct and evaluate performance.
  • this is merely a particular embodiment for purposes of illustration and claimed subject matter is not limited in scope to this particular embodiment or approach.
  • Typical data hiding methods were applied to the images, such as: Cox et al.'s non-blind SS data hiding method, see I. J. Cox, J.Kilian, T.Leighton and T. Shamoon, "Secure spread spectrum watermarking for multimedia," IEEE Trans, on Image Processing, 6, 12, 1673-1687, (1997); Piva et al.'s blind SS, see A.Piva, M.Barni, E.Bartolini, V.Cappellini, "DCT-based watermark recovering without resorting to the uncorrupted original image", Proc. ICIP 97, vol.
  • the threshold 7 was set to be 4, although, as previously indicated, claimed subject matter is not limited in scope to a fixed threshold value, or an integer value as well.
  • Effective prediction error values in this example range from [-4 to 4], with 9 different values in total. Therefore, the dimension of the transition matrix is 9 by 9, which is 81 features for an error image. Since we have three error images in three different directions, the number of total features is 243 for an image in this particular example, although, again, claimed subject matter is not limited in scope in this respect.
  • TN stands for “True Negative”, here, the detection rate of original cover images.
  • TP stands for “True Positive”, here, the detection rate of stego-images.
  • Average is the arithmetic mean of these two rates. In other words, it is the overall correct classification rate for all test images.
  • this particular embodiment has a True Positive rate of over 90% for Cox's SS, Piva's blind SS, QIM and LSB with embedding strength over 0.1 bpp.
  • Embedded data here comprises images with sizes ranging from 32x32 to 194x194.
  • Corresponding embedding data rates are from 0.02 bpp to 0.9 bpp and detection rates range from 1.9% to 78%.
  • this particular embodiment appears to outperform the approach shown in Lyu and Farid.
  • one embodiment may be in hardware, such as implemented to operate on a device or combination of devices, for example, whereas another embodiment may be in software.
  • an embodiment may be implemented in firmware, or as any combination of hardware, software, and/or firmware, for example.
  • one embodiment may comprise one or more articles, such as a storage medium or storage media.
  • This storage media such as, one or more CD-ROMs and/or disks, for example, may have stored thereon instructions, that when executed by a system, such as a computer system, computing platform, or other system, for example, may result in an embodiment of a method in accordance with claimed subject matter being executed, such as one of the embodiments previously described, for example.
  • a computing platform may include one or more processing units or processors, one or more input/output devices, such as a display, a keyboard and/or a mouse, and/or one or more memories, such as static random access memory, dynamic random access memory, flash memory, and/or a hard drive.
  • a display may be employed to display one or more queries, such as those that may be interrelated, and or one or more tree expressions, although, again, claimed subject matter is not limited in scope to this example.

Abstract

Briefly, in accordance with one embodiment, a method of identifying marked content is described.

Description

METHOD FOR IDENTIFYING MARKED CONTENT
FIELD
This application is related to classifying or identifying content, such as marked content, for example.
BACKGROUND
In recent years digital data hiding has become an active research field. Various kinds of data hiding methods have been proposed. Some methods aim at content protection, and/or authentication, while some aim at covert communication. The latter category of data hiding is referred to here as steganography.
BRIEF DESCRIPTION OF THE DRAWINGS
Subject matter is particularly pointed out and distinctly claimed in the concluding portion of the specification. Claimed subject matter, however, both as to organization and method of operation, together with objects, features, and/or advantages thereof, may best be understood by reference of the following detailed description if read with the accompanying drawings in which:
FIGS. 1A-C are schematic diagrams illustrating one embodiment of a predication error model as applied to content, such as an image.
DETAILED DESCRIPTION
In the following detailed description, numerous specific details are set forth to provide a thorough understanding of claimed subject matter. However, it will be understood by those skilled in the art that claimed subject matter may be practiced without these specific details. In other instances, well known methods, procedures, components and/or circuits have not been described in detail so .as not to obscure claimed subject matter.
Some portions of the detailed description which follow are presented in terms of algorithms and/or symbolic representations of operations on data bits and/or binary digital signals stored within a computing system, such as within a computer and/or computing system memory. These algorithmic descriptions and/or representations are the techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of operations and/or similar processing leading to a desired result. The operations and/or processing may involve physical manipulations of physical quantities. Typically, although not necessarily, these quantities may take the form of electrical and/or magnetic signals capable of being stored, transferred, combined, compared and/or otherwise manipulated. It has proven convenient, at times, principally for reasons of common usage, to refer to these signals as bits, data, values, elements, symbols, characters, terms, numbers, numerals and/or the like. It should be understood, however, that all of these and similar terms are to be associated with appropriate physical quantities and are merely convenient labels. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout this specification discussions utilizing terms such as "processing", "computing", "calculating", "determining" and/or the like refer to the actions and/or processes of a computing platform, such as a computer or a similar electronic computing device, that manipulates and/or transforms data represented as physical electronic and/or magnetic quantities and/or other physical quantities within the computing platform's processors, memories, registers, and/or other information storage, transmission, and/or display devices.
In recent years digital data hiding has become an active research field. Various kinds of data hiding methods have been proposed. Some methods aim at content protection, and/or authentication, while some aim at covert communication. The latter category of data hiding is referred to in this context as steganography.
In J.Fridrich, M.Goljan and D.Hogea, "Steganalysis of JPEG Images:
Breaking the F5 algorithm", 5th Information Hiding Workshop, 2002, pp. 310- 323, (hereinafter "Fridrich et al."), Fridrich et al. have shown that the number of zeros in a block DCT domain of a stego-image will increase if the F5 embedding method is applied to generate the stego-image. This feature may be used to determine whether hidden messages have been embedded with the F5 method in content, for example. There are other findings regarding steganalysis of particularly targeted data hiding methods. See, for example, J. Fridrich, M. Goljan and R. Du, "Detecting LSB steganography in color and gray-scale images", Magazine of IEEE Multimedia Special Issue on Security, Oct.-Nov. 2001 , pp. 22-28; and R.Chandramouli and N.Memon, "Analysis of LSB based image steganography techniques", Proc. of ICIP 2001 , Oct. 7-10, 2001.
In S. Lyu and H. Farid, "Detecting Hidden Messages Using Higher- Order Statistics and Support Vector Machines," 5th International Workshop on Information Hiding, Noordwijkerhout, The Netherlands, 2002 (hereinafter, "Lyu and Farid"), Lyu and Farid proposed a more general steganalysis method based at least in part on image high order statistics, derived from image decomposition with separable quadrature mirror filters. The wavelet high- frequency subbands' high order statistics are extracted as features for steganalysis in this approach. Likewise, this approach has been shown differentiate stego-images from cover images with a certain success rate. Data hiding methods addressed by this particular steganalysis primarily comprise least significant bit-plane (LSB) modification type steganographic tools.
In K.Sullivan, U.Madhow, S.Chandrasekaran, and B.S.Manjunath, "Steganalysis of Spread Spectrum Data Hiding Exploiting Cover Memory", SPIE2005, vol. 5681 , pp38-46., (hereinafter, "Sullivan et al.") a steganalysis method based at least in part on a hidden Markov model is proposed. The empirical transition matrix of a test image is formed in such an approach. However, the size of the empirical transition matrix is large, e.g., 65536 elements for a grey level image with a bit depth of 8. Thus, the matrix is not used as features directly. The authors select several largest probabilities along the main diagonal together with their neighbors, and randomly select some other probabilities along the main diagonal as features. Unfortunately, some useful information might be ignored due at least in part to the random fashion of feature formulation. The data hiding methods addressed by Sullivan et al. related primarily to spread spectrum (SS) data hiding methods. Although these latter methods may not carry as much information bits as LSB methods in general, SS methods may be used in connection with covert communications, for example. In addition, SS methods are known to be more robust than LSB methods. Therefore, it is desirable to consider SS methods for steganalysis.
One embodiment of a steganalysis system based at least in part on a 2-D Markov chain of thresholded prediction-error sets for content, such as images, for example, is described below, although claimed subject matter is not limited in scope in this respect. In this particular embodiment, content samples, such as, for example, image pixels, are predicted with their neighboring pixels, and a prediction-error image, for example, is generated by subtracting the prediction value from the pixel value and thresholding. Empirical transition matrixes along the horizontal, vertical and diagonal directions of Markov chains may, in such an embodiment serve as features for steganalysis. Analysis of variance type approaches, such as, for example, support vector machines (SVM) or genetic processes, may be applied for classification or identification, although, again, claimed subject matter is not limited in scope in this respect. Continuing with this particular embodiment, although claimed subject matter is not limited in scope to only one embodiment, a steganalysis system based at least in part on a Markov chain model of thresholded prediction-error images may be applied. Image pixels are predicted with the neighboring pixels. Prediction error in this particular embodiment is obtained by subtracting the prediction values from the pixel value. Though the range of the difference values is increased, the majority of the difference values may be concentrated in a relatively small range near zero owing to a correlation between neighboring pixels in unmarked images. In this context, the term marked content refers to content in which data has been hidden so that it is not apparent that the content contains such hidden information. Likewise, unmarked or cover content refers to content in which data has not been hidden. Large values in a prediction-error image, however, may be attributed at least in part to image content rather than data hiding. Therefore, a threshold applied to prediction error may reduce or remove large values in the prediction error images, thus limiting the dynamic range of a prediction-error image.
In this particular embodiment, although claimed subject matter is not limited in scope in this respect, prediction-error images may be modeled using a Markov chain. An empirical transition matrix is calculated and serves as features for steganalysis. Owing at least in part to thresholding, the size of empirical transition matrixes is decreased to a manageable size for classifiers so that probabilities in the matrixes may be included in feature vectors. For feature classification, an analysis of variance or other statistical approach may be applied. For example, an SVM process may be applied with both linear and non-linear kernels used for classification, as described in more detail below. In this context, the term "analysis of variance process" refers to a process in which differences attributable to statistical variation are sufficiently distinguished from differences attributable to non-statistical variation that correlation, segmentation, analysis, classification or other characterization of the data based at least in part on such a process may be performed. While the term steganalysis may have a variety of meanings, for the purpose of this particular embodiment, it refers to a two-class pattern classification approach. For example, a test image may be classified as either a cover image, namely, information is not hidden in it, or a stego-image or marked image, which carries hidden data or hidden messages. Generally, in this particular approach or embodiment, the classification comprises two parts, although claimed subject matter is not limited in scope to employing only two classifications. Other approaches are possible and are included within the scope of claimed subject matter. Here, these parts are referred to as feature extraction and pattern classification, respectively. In many instances, it would be desirable to use the image itself for features in this process due at least in part to the large amount of information it contains. However, likewise, from a feasibility standpoint, the dimensionality of features may be too high for most classifiers. Therefore, feature extraction may be applied.
For computer vision type situations, it may be desirable for the feature to represent the shape and color of an object. For this particular embodiment, in contrast, other properties may provide useful information. In steganalysis, for example, it is desirable to have a feature contain information about changes incurred by data hiding as opposed to information about the content of the image.
Generally speaking, unmarked images, for example, may tend to exhibit particular properties, such as continuous, smooth, and having a correlation between neighboring pixels. Likewise, hidden data may be independent of the content itself. A watermarking process, for example, may change continuity with respect to the unmarked content because it may introduce some amount of random variation, for example. As a result, it may reduce correlation among adjacent pixels, bit-planes and image blocks. In this particular embodiment, it would be desirable if this potential variation that may be attributed to data hiding is amplified. This may be accomplished by anyone of a number of possible approaches and claimed subject matter is not limited in scope to a particular approach. However, below, one particular embodiment for accomplishing this is described.
In this particular embodiment, neighboring pixels may be used to predict the current pixel. For this embodiment, the predictions may be made in three directions. Again, for this embodiment, these directions include horizontal, vertical and diagonal, although in other embodiments other directions are possible. For a prediction, prediction error may be estimated or obtained by subtracting a predicted pixel value from a original pixel value as shown in (1),
Figure imgf000008_0001
ev (ij) = x(i, j+1)-x(i, j) (1)
Figure imgf000008_0002
where eh (i, j) indicates prediction error for pixel (i, j) along a horizontal direction, ev (i, j) indicates prediction error for pixel (i, j) along a vertical direction and βd (i, j) indicates prediction error for pixel (i, j) along a diagonal direction, respectively. For a pixel of an image, we therefore estimate three prediction errors in this embodiment. At this point, prediction errors will form three prediction-error images denoted here by Eh, Ev and Ed, respectively.
It is observed that potential distortions introduced by data hiding may usually be small compared with differences along pixels associated with, for example, the presence of different objects in an image. Otherwise, distortion itself may suggest hidden data if inspected by human eyes, thus potentially undermining the covert communication. Therefore, large prediction errors may tend to reflect more with respect to image content rather than hidden data. For this particular embodiment, to address this, a threshold T may be adopted the prediction errors may be adjusted according to the following rule:
Figure imgf000009_0001
It is noted, of course, that claimed subject matter is not limited in scope to this particular approach. Many other approaches to address the predication error are possible and intended to be included within the scope of claimed subject matter. Likewise, T depending, for example, on the particular embodiment, may not comprise a fixed value. For example, it may vary with time, location, and a host of other potential factors.
Nonetheless, continuing with this example, large prediction errors may be treated as 0. In other words, image pixels or other content samples may be regarded as smooth from the data hiding point of view. Continuing with this specific example, the value range of a prediction-error image is [-T, T], with 2*T+1 possible values.
Likewise, for this embodiment, a 2-D Markov chain model is applied to the thresholded prediction error images, rather than 1-D, for example. FIG. 1A is a schematic diagram illustrating an embodiment of transition model for horizontal prediction-error image Eh, in which a Markov chain is modeled along the horizontal direction, for example. FIG. 1 B and FIG. 1C are schematic diagrams illustrating corresponding embodiments for Ev and Ed, respectively. As suggested previously, and explained in more detail below, elements of the empirical transition matrices for Eh, Ev and Ed in this embodiment are employed as features. In FIGs. 1A-C, one circle represents one pixel. The diagrams show an image of size 8 by 8. The arrows represent the state change in a Markov chain.
A variety of techniques are available to analyze data in a variety of contexts. In this context, we use the term "analysis of variance process" to refer to processes or techniques that may be applied so that differences attributable to statistical variation are sufficiently distinguished from differences attributable to non-statistical variation to correlate, segment, classify, analyze or otherwise characterize the data based at least in part on application of such processes or techniques. Examples, without intending to limit the scope of claimed subject matter includes: artificial intelligence techniques and processes; neutral networks; genetic processes; heuristics; and support vector machines (SVM).
Although claimed subject matter is not limited in scope to SVM or SVM processes, it may be a convenient approach for two-class classification. See, for example, C. Cortes and V.Vapnik, "Support-vector networks," in Machine Learning, 20, 273-297, Kluwer Academic Publishers, 1995. SVM may, for example, be employed to handle linear and non-linear cases or situations. For linearly separable cases, for example, an SVM classifier may be applied to search for a hyper-plane that separates a positive pattern from a negative pattern. For example, one may denote training data pairs {y,.,
Figure imgf000010_0001
, where y^ is a feature vector, and ωt =±1 for positive/negative pattern.
For this particular embodiment, linear support vector processes may be formulated as follows. If a separating hyper-plane exists, training data satisfies the following constraints:
w'y,. +Z?>l if ω,. = + l (3)
w'y, +6≤ -l if fl>, =-l (4) A Lagrangian formulation may likewise be constructed as follows:
Figure imgf000011_0001
where CX1 is the positive Lagrange multiplier introduced for inequality constraints, here (3) & (4). The gradient of L with respect to w and b provides:
*-*i l J l l and ϋ / ZJ W1 W J / ' (6)
/=1 * /=i
In this embodiment, by training an SVM classifier, a sample z from testing data may be classified using w and b. For example, in one embodiment, if w'z + b is greater than or equal to zero, the image may be classified as having a hidden message. Otherwise, it may be classified as not containing a hidden message. Of course, this is a particular embodiment and claimed subject matter is not limited in scope in this respect. For example, conventions regarding positive, negative or functional form may vary depending on a variety of factors and situations.
For a non-linearly separable case, a "learning machine" may map input feature vectors to a higher dimensional space in which a linear hyper-plane may potentially be located. In this embodiment, a transformation from nonlinear feature space to linear higher dimensional space may be performed using a kernel function. Examples of kernels include: linear, polynomial, radial basis function and sigmoid. For this particular embodiment, a linear kernel may be employed in connection with a linear SVM process, for example.
Likewise, other kernels may be employed in connection with a non-linear SVM process. Having formulated an embodiment system for identifying or classifying marked content, such as images, for example, it is desirable to construct and evaluate performance. However, again, we note that this is merely a particular embodiment for purposes of illustration and claimed subject matter is not limited in scope to this particular embodiment or approach.
For evaluation purposes, 2812 images were downloaded from the website of Vision Research Lab, University of California, Santa Barbara, see http://vision.ece.ucsb.edu/~sullivak/Research imgs/, and 1096 sample images included in the CorelDRAW Version 10.0 software CD#3, see www.corel.com.
Thus, 3908 images were employed as a test image dataset. Color images were converted to grey level images applying an Irreversible Color Transform, such as illustrated by (7) below, see, for example, M. Rabbani and R. Joshi, "An Overview of the JPEG2000 Still Image Compression Standard", Signal Processing: Image Communication 17 (2002) 3-48:
7 = 0.299R + 0.587G + 0.114£ (7)
Typical data hiding methods were applied to the images, such as: Cox et al.'s non-blind SS data hiding method, see I. J. Cox, J.Kilian, T.Leighton and T. Shamoon, "Secure spread spectrum watermarking for multimedia," IEEE Trans, on Image Processing, 6, 12, 1673-1687, (1997); Piva et al.'s blind SS, see A.Piva, M.Barni, E.Bartolini, V.Cappellini, "DCT-based watermark recovering without resorting to the uncorrupted original image", Proc. ICIP 97, vol. 1 , pp.520; and a generic quantization index modulation (QIM) data hiding method, see B.Chen and G.W.Wornell, "Digital watermarking and information embedding using dither modulation," Proceedings of IEEE MMSP 1998, pp273 - 278.1. S, (here with a step size of 5 and an embedding rate of 0.1 bpp), and generic LSB. For these data hiding methods, different random or quasi-random signals were embedded into different images. For generic LSB data hiding, embedding positions were randomly selected for different images. Therefore, this approach may be applied to steganographic tools that use LSB as the message embedding method. Various data embedding rates ranging from 0.3 bpp to as low as 0.01 bpp were applied. This range of embedding rates is comparable to that reported in the aforementioned Lye and Farid for those LSB based stego tools. However, this evaluation might be considered more general due at least in part to embedding position selection.
In this particular experimental evaluation, the threshold 7 was set to be 4, although, as previously indicated, claimed subject matter is not limited in scope to a fixed threshold value, or an integer value as well. Effective prediction error values in this example range from [-4 to 4], with 9 different values in total. Therefore, the dimension of the transition matrix is 9 by 9, which is 81 features for an error image. Since we have three error images in three different directions, the number of total features is 243 for an image in this particular example, although, again, claimed subject matter is not limited in scope in this respect.
For an image in the image database, stego-images with the above- mentioned data hiding methods were generated. The system was evaluated in turn with the data hiding methods discussed above. A randomly or quasi- randomly selected half of the original set images, and the corresponding stego-images, were used for training. The remaining pairs of original and corresponding stego-images were put through the trained SVM, in this embodiment, to evaluate performance. Here, the detection rate is defined as the ratio of the number of correctly classified images with respect to the number of test images. A test approach was applied 20 times. The following experimental data represents an average of the applications for a test. Initially, a linear SVM process was applied. Linear SVM has an advantage of relatively fast training. However, it may not perform as well for non-linearly separable patterns. The Matlab SVM code from LIBSVM was used, see C.C.Chang and C.J.Lin, LIBSVM: a library for support vector machines, 2001 , http://www.csie.ntu.edu.tw/~cilin/libsvm. Table 1 provides the test results.
Table 1
Figure imgf000014_0001
In Table 1 , "TN" stands for "True Negative", here, the detection rate of original cover images. "TP" stands for "True Positive", here, the detection rate of stego-images. "Average" is the arithmetic mean of these two rates. In other words, it is the overall correct classification rate for all test images.
The Markov chain based method described in Sullivan et al. was applied into the same set of images and the same data hiding methods. The same training and testing procedures were used. The results are listed in Table 2. This data illustrates that, for this example, the embodiment shown outperforms the method from Sullivan et al., in particular, for LSB methods.
Table 2
Figure imgf000015_0001
Likewise, in another evaluation, a polynomial kernel was employed train the 243-D features and the 129-D features from above. The results are listed in Table 3 and Table 4, respectively. Here, in this example, this particular embodiment has a True Positive rate of over 90% for Cox's SS, Piva's blind SS, QIM and LSB with embedding strength over 0.1 bpp. Embedded data here comprises images with sizes ranging from 32x32 to 194x194. Corresponding embedding data rates are from 0.02 bpp to 0.9 bpp and detection rates range from 1.9% to 78%. Thus, compared with the results reported in Lyu and Farid, this particular embodiment appears to outperform the approach shown in Lyu and Farid.
Table 3
Figure imgf000016_0001
Table 4
Figure imgf000017_0001
It will, of course, be understood that, although particular embodiments have just been described, the claimed subject matter is not limited in scope to a particular embodiment or implementation. For example, one embodiment may be in hardware, such as implemented to operate on a device or combination of devices, for example, whereas another embodiment may be in software. Likewise, an embodiment may be implemented in firmware, or as any combination of hardware, software, and/or firmware, for example. Likewise, although claimed subject matter is not limited in scope in this respect, one embodiment may comprise one or more articles, such as a storage medium or storage media. This storage media, such as, one or more CD-ROMs and/or disks, for example, may have stored thereon instructions, that when executed by a system, such as a computer system, computing platform, or other system, for example, may result in an embodiment of a method in accordance with claimed subject matter being executed, such as one of the embodiments previously described, for example. As one potential example, a computing platform may include one or more processing units or processors, one or more input/output devices, such as a display, a keyboard and/or a mouse, and/or one or more memories, such as static random access memory, dynamic random access memory, flash memory, and/or a hard drive. For example, a display may be employed to display one or more queries, such as those that may be interrelated, and or one or more tree expressions, although, again, claimed subject matter is not limited in scope to this example.
In the preceding description, various aspects of claimed subject matter have been described. For purposes of explanation, specific numbers, systems and/or configurations were set forth to provide a thorough understanding of claimed subject matter. However, it should be apparent to one skilled in the art having the benefit of this disclosure that claimed subject matter may be practiced without the specific details. In other instances, well known features were omitted and/or simplified so as not to obscure the claimed subject matter. While certain features have been illustrated and/or described herein, many modifications, substitutions, changes and/or equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and/or changes as fall within the true spirit of claimed subject matter.

Claims

Claims:
1. A method of training a content classification process comprising: processing selected content as follows: forming multiple prediction error sets from neighboring samples of said selected content; thresholding the formed prediction error sets; and training an analysis of variance process using the thresholded prediction error sets.
2. The method of claim 1 wherein said content comprises images.
3. The method of claim 1 , wherein said analysis of variance process comprises an SVM process.
4. The method of claim 1 , wherein said multiple prediction error sets comprise at least three prediction error images.
5. The method of claim 4, wherein said prediction error images comprise a horizontal prediction error image, a vertical prediction error image and a diagonal prediction error image.
6. The method of claim 1 , wherein said thresholding comprises nonuniform thresholding.
7. A method of classifying content comprising: applying a trained analysis of variance process to content; and classifying the content based at least in part on the value obtained from application of the trained analysis of variance process.
8. The method of claim 7, wherein said trained analysis of variance process comprises a trained SVM process.
9. The method of claim 7, wherein said content comprises images.
10. The method of claim 9, wherein said trained analysis of variance process is based at least in part on thresholded predication error images.
11. A method of training an image classification process comprising: processing a selected image as follows: forming three prediction error images from neighboring pixels of said selected image; thresholding the formed prediction error images; and training an SVM process using the thresholded prediction error images.
12. The method of claim 11 , wherein said prediction error images comprise a horizontal prediction error image, a vertical prediction error image and a diagonal prediction error image.
13. The method of claim 11 , wherein said thresholding comprises nonuniform thresholding.
14. A method of classifying images comprising: applying a trained SVM process to an image; and classifying the image based at least in part on the value obtained from application of the trained SVM process.
15. The method of claim 14, wherein said trained SVM process is based at least in part on thresholded predication error images.
16. An article comprising: a storage medium having stored thereon instructions that if executed result in performance of a method of processing selected content as follows: forming multiple prediction error sets from neighboring samples of said selected content; thresholding the formed prediction error sets; and training an analysis of variance process using the thresholded prediction error sets.
17. The article of claim 16, wherein said content comprises images.
18. The article of claim 16, wherein said instructions if executed further result in said analysis of variance process comprising an SVM process.
19. The article of claim 16, wherein said instructions if executed further result in said multiple prediction error sets comprising at least three prediction error images.
20. The article of claim 19, wherein said instructions if executed further result in said prediction error images comprising a horizontal prediction error image, a vertical prediction error image and a diagonal prediction error image.
21. The article of claim 16, wherein said instructions if executed further result in said thresholding comprising non-uniform thresholding.
22. An article comprising: a storage medium having stored thereon instructions that if executed results in performance of a method of classifying content comprising: applying a trained analysis of variance process to content; and classifying the content based at least in part on the value obtained from application of the trained analysis of variance process.
23. The article of claim 22, wherein said instructions if executed further result in said trained analysis of variance process comprising a trained SVM process.
24. The article of claim 22, wherein said content comprises images.
25. The article of claim 24, wherein said instructions if executed further resulting said trained analysis of variance process is based at least in part on thresholded predication error images.
26. An apparatus comprising: means for forming multiple prediction error sets from neighboring samples of said selected content; means for thresholding the formed prediction error sets; and means for training an analysis of variance process using the thresholded prediction error sets.
27. The apparatus of claim 26, wherein said content comprises images.
28. The apparatus of claim 26, wherein said means for training an analysis of variance process comprises means for training an SVM process.
29. The apparatus of claim 26, wherein said means for said thresholding comprises means for non-uniform thresholding.
30. An apparatus comprising: means for applying a trained analysis of variance process to content; and means for classifying the content based at least in part on the value obtained from application of the trained analysis of variance process.
31. The apparatus of claim 30, wherein said means for applying a trained analysis of variance process comprises means for applying a trained SVM process.
32. The apparatus of claim 30, wherein said content comprises images.
33. The apparatus of claim 32, wherein said means for applying a trained SVM process is based at least in part on thresholded prediction error images.
PCT/US2006/001338 2006-01-13 2006-01-13 Method for identifying marked content WO2007081344A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2008550282A JP2009524078A (en) 2006-01-13 2006-01-13 How to identify marked content
PCT/US2006/001338 WO2007081344A1 (en) 2006-01-13 2006-01-13 Method for identifying marked content
EP06718417A EP1971961A4 (en) 2006-01-13 2006-01-13 Method for identifying marked content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2006/001338 WO2007081344A1 (en) 2006-01-13 2006-01-13 Method for identifying marked content

Publications (1)

Publication Number Publication Date
WO2007081344A1 true WO2007081344A1 (en) 2007-07-19

Family

ID=38256629

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/001338 WO2007081344A1 (en) 2006-01-13 2006-01-13 Method for identifying marked content

Country Status (3)

Country Link
EP (1) EP1971961A4 (en)
JP (1) JP2009524078A (en)
WO (1) WO2007081344A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5649068A (en) * 1993-07-27 1997-07-15 Lucent Technologies Inc. Pattern recognition system using support vectors
US5768438A (en) * 1994-10-19 1998-06-16 Matsushita Electric Industrial Co., Ltd. Image encoding/decoding device
US6889129B2 (en) * 2002-05-24 2005-05-03 Denso Corporation Vehicle seat occupant classifying method and apparatus based on a support vector machine
US7054847B2 (en) * 2001-09-05 2006-05-30 Pavilion Technologies, Inc. System and method for on-line training of a support vector machine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5649068A (en) * 1993-07-27 1997-07-15 Lucent Technologies Inc. Pattern recognition system using support vectors
US5768438A (en) * 1994-10-19 1998-06-16 Matsushita Electric Industrial Co., Ltd. Image encoding/decoding device
US7054847B2 (en) * 2001-09-05 2006-05-30 Pavilion Technologies, Inc. System and method for on-line training of a support vector machine
US6889129B2 (en) * 2002-05-24 2005-05-03 Denso Corporation Vehicle seat occupant classifying method and apparatus based on a support vector machine

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LYU, S.; FARID, H.: "Steganalysis Using Color Wavelet Statistics and One-Class Support Vector Machines", PROCEEDINGS OF SPIR, vol. 5306, no. 1, 19 January 2004 (2004-01-19)
RADHAKRISHNAN, R.; KHARRAZI, M; MEMON, N., DATA MASKING: A NEW APPROACH FOR STEGANOGRAPHY?
See also references of EP1971961A4 *

Also Published As

Publication number Publication date
EP1971961A4 (en) 2011-06-29
JP2009524078A (en) 2009-06-25
EP1971961A1 (en) 2008-09-24

Similar Documents

Publication Publication Date Title
Zou et al. Steganalysis based on Markov model of thresholded prediction-error image
US8224017B2 (en) Method for identifying marked content
Karampidis et al. A review of image steganalysis techniques for digital forensics
JP4417419B2 (en) Input image stego analysis method, apparatus, and computer-readable medium
Atta et al. A high payload steganography mechanism based on wavelet packet transformation and neutrosophic set
Tsai et al. Joint robustness and security enhancement for feature-based image watermarking using invariant feature regions
Sabeti et al. An adaptive LSB matching steganography based on octonary complexity measure
US7925080B2 (en) Method for identifying marked images based at least in part on frequency domain coefficient differences
Zhao et al. Passive forensics for region duplication image forgery based on harris feature points and local binary patterns
Badr et al. A review on steganalysis techniques: from image format point of view
Amiri et al. Robust watermarking against print and scan attack through efficient modeling algorithm
He et al. A geometrical attack resistant image watermarking algorithm based on histogram modification
US20070270978A1 (en) Method for identifying marked content, such as by using a class-wise non-principal component approach
Zhao et al. Steganalysis for palette-based images using generalized difference image and color correlogram
Berg et al. Searching for Hidden Messages: Automatic Detection of Steganography.
Akhtar et al. Revealing the traces of histogram equalisation in digital images
Lakshmi A novel steganalytic algorithm based on III level DWT with energy as feature
EP1971961A1 (en) Method for identifying marked content
Liu et al. Steganalysis of multi-class JPEG images based on expanded Markov features and polynomial fitting
Vashishtha et al. Least significant bit matching steganalysis based on feature analysis
Mohamed et al. RST robust watermarking schema based on image normalization and DCT decomposition
Lin et al. Content-adaptive residual for steganalysis
Xiang et al. Distortion-free robust reversible watermarking by modifying and recording IWT means of image blocks
Xia et al. Perceptual image hashing using rotation invariant uniform local binary patterns and color feature
Bhasin et al. StegTrack: tracking images with hidden content

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2008550282

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2006718417

Country of ref document: EP