KR20230006982A

KR20230006982A - Apparatus and method for distinguishing manipulated image based on discrete cosine transform information

Info

Publication number: KR20230006982A
Application number: KR1020210087649A
Authority: KR
Inventors: 이흥규; 권명준; 유인재; 남승훈
Original assignee: 한국과학기술원; (주)디지탈이노텍
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2023-01-12

Abstract

An apparatus for distinguishing a manipulated image according to an embodiment of the present invention may include: an image acquiring part for acquiring a target image; an image processing part for extracting compressed tracking information related to a discrete cosine transform (DCT) coefficient from the target image; and a distinguishing part for inputting the compressed tracking information of the target image to the trained image discriminator based on a plurality of training images labeled with classes of whether the compressed tracking information for each pixel included in the training image is manipulated, and distinguishing whether the manipulated pixel is included in the target image.

Description

Apparatus and method for discriminating manipulated images based on DCT space information

본 발명은 DCT 공간 정보에 기초한 조작 이미지 판별 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for discriminating a manipulated image based on DCT spatial information.

이미지 편집 기술과 이미지를 공유할 수 있는 인터넷 환경의 발달에 따라, 디지털 이미지는 쉽게 캡쳐되어 온라인에 게시되고, 또한 다양한 소셜 네트워크 서비스를 통해 수많은 사람들에게 전송되고 있다. With the development of image editing technology and an Internet environment capable of sharing images, digital images are easily captured, posted online, and transmitted to numerous people through various social network services.

이러한 디지털 이미지는 이미지 편집 프로그램에 의해 생성된 조작 이미지를 포함할 수 있는데, 이미지 조작 방식 중 하나인 스플라이싱은 이미지의 특정 영역을 복사해서 다른 이미지 위에 붙여 넣는 일반적인 이미지 조작 방식으로서, 이미지 조작이 용이하게 이루어질 수 있는 것과는 반대로, 조작된 이미지를 육안으로 판별하는 것이 쉽지 않다. These digital images may include manipulated images created by image editing programs. Splicing, one of the image manipulation methods, is a general image manipulation method in which a specific area of an image is copied and pasted onto another image. Contrary to what can be easily done, it is not easy to visually discriminate a manipulated image.

이러한 조작 이미지는 왜곡된 정보를 전파하여 오해를 일으키거나 선동에 사용되는 부정적인 효과를 발생시킬 수 있다. 이에 따라, 사람에 의하지 않고도 조작된 이미지와 원본 이미지를 자동적으로 구분하는 조작 이미지 판별 기술에 대한 연구가 활발히 진행 중이다.These manipulated images can cause misunderstanding by disseminating distorted information or cause negative effects used for propaganda. Accordingly, research on a manipulation image discrimination technology that automatically distinguishes a manipulated image from an original image without relying on a person is being actively conducted.

대한민국 등록특허공보 제10-1181086호 (2012년09월03일 등록)Republic of Korea Patent Registration No. 10-1181086 (registered on September 3, 2012)

본 발명이 해결하고자 하는 과제는, 디지털 이미지가 생성되는 과정에서 발생하는 압축 흔적과, 조작된 이미지가 편집되는 부분에서 나타나는 압축 흔적이 서로 다르다는 점에 따라, DCT 공간 정보에 기초하여 타겟 이미지의 조작 여부를 판별하도록 학습된 신경망 기반의 조작 이미지 판별 장치 및 방법을 제공하는 것이다. The problem to be solved by the present invention is to manipulate a target image based on DCT spatial information according to the difference between the compression traces generated in the process of generating a digital image and the compression traces appearing in the part where the manipulated image is edited. It is to provide an apparatus and method for discriminating manipulated images based on a neural network learned to discriminate whether

다만, 본 발명이 해결하고자 하는 과제는 이상에서 언급한 바로 제한되지 않으며, 언급되지는 않았으나 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있는 목적을 포함할 수 있다.However, the problems to be solved by the present invention are not limited to those mentioned above, but include objects that are not mentioned but can be clearly understood by those skilled in the art from the description below. can do.

본 발명의 일 실시예에 따른 조작 이미지 판별 장치는 타겟 이미지를 획득하는 이미지 획득부; 상기 타겟 이미지의 DCT(Discrete Cosine Transform) 공간 정보를 추출하는 이미지 처리부; 및 학습 이미지가 포함하는 DCT 공간 정보에 대한 조작 여부의 클래스가 레이블링된 복수의 학습 이미지를 기초로 학습된 이미지 판별기에 상기 타겟 이미지의 DCT 공간 정보를 입력하여 상기 타겟 이미지 내에 조작된 픽셀이 포함되었는지 여부를 판별하는 판별부를 포함할 수 있다. An apparatus for determining a manipulated image according to an embodiment of the present invention includes an image acquiring unit acquiring a target image; an image processor extracting DCT (Discrete Cosine Transform) spatial information of the target image; and inputting the DCT spatial information of the target image to an image discriminator learned based on a plurality of training images labeled with a class of manipulation of DCT spatial information included in the training image to determine whether a manipulated pixel is included in the target image. It may include a determination unit for determining whether or not.

또한, 상기 DCT 공간 정보는, 상기 타겟 이미지의 픽셀별 DCT 계수 및 상기 타겟 이미지의 압축에 사용된 양자화 테이블을 포함할 수 있다.Also, the DCT spatial information may include a DCT coefficient for each pixel of the target image and a quantization table used for compression of the target image.

또한, 상기 이미지 처리부는, 상기 타겟 이미지의 압축에 사용된 양자화 테이블의 가로 픽셀 및 세로 픽셀 개수의 배수 단위로 상기 타겟 이미지를 크롭한 후 상기 DCT 공간 정보를 추출할 수 있다.The image processing unit may extract the DCT spatial information after cropping the target image in units of multiples of the number of horizontal pixels and vertical pixels of the quantization table used to compress the target image.

또한, 상기 이미지 판별기는, 타겟 이미지에 포함된 픽셀별 DCT 계수에 대해 주파수를 기준으로 DCT 계수가 분리된 특징맵을 생성하는 변환 모듈을 기초로, 상기 변환 모듈이 상기 DCT 공간 정보로부터 생성한 특징맵에 대해 합성곱 연산을 수행하여 상기 타겟 이미지의 제1 특징값을 추출하는 DCT 신경망을 포함할 수 있다.In addition, the image discriminator is based on a transformation module that generates a feature map in which DCT coefficients are separated based on frequency for DCT coefficients for each pixel included in the target image, and features generated by the transformation module from the DCT spatial information. It may include a DCT neural network that extracts the first feature value of the target image by performing a convolution operation on the map.

또한, 상기 변환 모듈은, 타겟 이미지에 포함된 픽셀별 DCT 계수에 대해 소정 범위의 DCT 계수 내에서, 같은 값의 DCT 계수를 갖는 픽셀에 대해 이진값을 부여한 별도의 특징맵을 생성하고, 상기 생성된 별도의 특징맵들을 연결한 이진 볼륨을 생성하는 이진 볼륨 변환 레이어를 포함할 수 있다.In addition, the transformation module generates a separate feature map in which binary values are assigned to pixels having DCT coefficients of the same value within a predetermined range of DCT coefficients for each pixel included in the target image, and the generation It may include a binary volume transformation layer that creates a binary volume by connecting the separate feature maps.

또한, 상기 변환 모듈은, 상기 이진 볼륨으로부터 생성된 특징맵을 상기 타겟 이미지에 사용된 양자화 테이블의 사이즈 단위로 구분하고, 각 양자화 테이블의 같은 주파수별 성분을 추출한 제1 주파수별 특징맵을 생성하는 주파수별 분리 레이어를 포함할 수 있다.In addition, the transformation module divides the feature map generated from the binary volume into units of sizes of quantization tables used in the target image, and generates a first frequency-specific feature map obtained by extracting the same component for each frequency of each quantization table. A separation layer for each frequency may be included.

또한, 상기 변환 모듈은, 상기 이진 볼륨으로부터 생성된 특징맵과 대응되는 사이즈가 되도록 상기 양자화 테이블을 연결하여 생성한 블록 배열 특징맵과 상기 이진 볼륨으로부터 생성된 특징맵에 대해 원소별 곱셈한 후 같은 주파수별 성분을 추출한 제2 주파수별 특징맵과 상기 제1 주파수별 특징맵을 접합하여, 상기 타겟 이미지에 포함된 픽셀별 DCT 계수에 대해 주파수를 기준으로 DCT 계수가 분리된 특징맵을 생성할 수 있다.In addition, the conversion module performs element-by-element multiplication of a feature map generated from the binary volume and a block array feature map generated by connecting the quantization table to have a size corresponding to that of the feature map generated from the binary volume, and then has the same size. A feature map in which DCT coefficients are separated based on frequency may be generated from DCT coefficients for each pixel included in the target image by combining a second feature map for each frequency from which components for each frequency are extracted and the first feature map for each frequency. there is.

또한, 상기 이진 볼륨 변환 레이어는, 상기 타겟 이미지의 2차원 픽셀 위치(i, j)에 포함된 DCT 계수에 대해, 0이상 T(소정의 자연수)이하의 순서로 아래 수학식 1에 따라 0 또는 1의 값으로 변환한 별도의 특징맵을 생성하여, 상기 생성된 (T+1)개의 특징맵을 연결할 수 있다.In addition, the binary volume conversion layer, according to Equation 1 below, in the order of 0 or more and T (predetermined natural number) or less, with respect to the DCT coefficient included in the 2-dimensional pixel position (i, j) of the target image, 0 or A separate feature map converted to a value of 1 may be created, and the generated (T+1) feature maps may be connected.

[수학식 1][Equation 1]

(M: 타겟 이미지에 포함된 픽셀별 DCT 계수, clip(M)ij는 M에 포함된 DCT 계수 중 좌표 (i, j)의 DCT 계수가 T 보다 크면 T로 변환하고, -T 보다 작으면 -T로 변환하는 함수, abs()는 절대값 함수)(M: DCT coefficient for each pixel included in the target image, clip(M)ij is converted to T if the DCT coefficient of coordinates (i, j) among the DCT coefficients included in M is greater than T, and is less than -T - A function that converts to T, abs() is an absolute value function)

또한, 상기 이미지 판별기는, 상기 타겟 이미지의 픽셀별 RGB 정보를 기초로 생성된 특징맵에 대해 합성곱 연산을 수행하여 상기 타겟 이미지의 제2 특징값을 생성하는 RGB 신경망; 및 상기 제1 특징값과 상기 제2 특징값을 입력으로 받아 합성곱 연산을 수행하여 상기 타겟 이미지의 DCT 정보 및 RGB 정보가 함께 반영된 제3 특징값을 생성하는 합성 신경망을 더 포함할 수 있다. The image discriminator may include: an RGB neural network generating a second feature value of the target image by performing a convolution operation on a feature map generated based on RGB information for each pixel of the target image; and a synthetic neural network that receives the first feature value and the second feature value as inputs and performs a convolution operation to generate a third feature value in which DCT information and RGB information of the target image are reflected together.

본 발명의 일 실시예에 따른 조작 이미지 판별 장치가 수행하는 조작 이미지 판별 방법은 타겟 이미지를 획득하는 단계; 상기 타겟 이미지의 DCT(Discrete Cosine Transform) 공간 정보를 추출하는 단계; 및 학습 이미지가 포함하는 DCT 공간 정보에 대한 조작 여부의 클래스가 레이블링된 복수의 학습 이미지를 기초로 학습된 이미지 판별기에 상기 타겟 이미지의 DCT 공간 정보를 입력하여 상기 타겟 이미지 내에 조작된 픽셀이 포함되었는지 여부를 판별하는 단계를 포함할 수 있다. A manipulation image discrimination method performed by a manipulation image discrimination apparatus according to an embodiment of the present invention includes acquiring a target image; extracting discrete cosine transform (DCT) spatial information of the target image; and inputting the DCT spatial information of the target image to an image discriminator learned based on a plurality of training images labeled with a class of manipulation of DCT spatial information included in the training image to determine whether a manipulated pixel is included in the target image. It may include a step of determining whether or not.

본 발명의 실시예에 의하면, 타겟 이미지를 양자화 테이블의 격자 크기에 기초하여 크롭하고, 이를 기반으로 DCT 계수를 이산 볼륨으로 변환하는 방식을 통해, 이미지 판별 딥러닝 알고리즘에 주파수별 DCT 계수를 활용할 수 있게 하고, DCT 계수의 도메인과 RGB 정보의 도메인이 같은 도메인에 위치하도록 함으로써, 높은 성능으로 타겟 이미지의 압축 흔적을 분석하여 타겟 이미지의 조작 여부를 판단할 수 있다.According to an embodiment of the present invention, DCT coefficients for each frequency can be used in an image discrimination deep learning algorithm by cropping a target image based on the grid size of a quantization table and converting DCT coefficients into discrete volumes based on this. By making the DCT coefficient domain and the RGB information domain located in the same domain, it is possible to determine whether the target image has been manipulated by analyzing the compression trace of the target image with high performance.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable in the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. will be.

도 1은 타겟 이미지 내 원본 부분과 조작된 부분에 대한 DCT 계수 분포를 분석한 결과이다.
도 2는 본 발명의 일 실시예에 따른 조작 이미지 판별 장치의 기능 블록도이다.
도 3은 본 발명의 일 실시예에 따른 이미지 판별기의 신경망 구조를 나타낸 예시도이다.
도 4는 HRNet에 사용되는 합성곱 유닛의 구조에 대한 예시도이다.
도 5는 HRNet에 사용되는 퓨전 합성곱 유닛의 구조에 대한 예시도이다.
도 6은 본 발명의 일 실시예에 따른 DCT 신경망의 변환 모듈의 구조에 대한 예시도이다.
도 7 및 도 8은 본 발명의 일 실시예에 따른 주파수별 분리 레이어의 동작을 설명하기 위한 예시도이다.
도 9는 본 발명의 일 실시예에 따른 조작 이미지 판별 방법의 흐름도이다.
도 10은 본 발명의 일 실시예에 따른 조작 이미지 판별 방법의 결과를 예시한 도면이다.1 is a result of analyzing DCT coefficient distributions for an original part and a manipulated part in a target image.
2 is a functional block diagram of a manipulation image discrimination device according to an embodiment of the present invention.
3 is an exemplary view showing the structure of a neural network of an image discriminator according to an embodiment of the present invention.
4 is an exemplary diagram of the structure of a convolution unit used in HRNet.
5 is an exemplary diagram of the structure of a fusion convolution unit used in HRNet.
6 is an exemplary diagram of the structure of a transformation module of a DCT neural network according to an embodiment of the present invention.
7 and 8 are exemplary diagrams for explaining an operation of a separation layer for each frequency according to an embodiment of the present invention.
9 is a flowchart of a method for determining a manipulated image according to an embodiment of the present invention.
10 is a diagram illustrating a result of a manipulation image determination method according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명의 범주는 청구항에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and can be implemented in various forms, only the present embodiments are intended to complete the disclosure of the present invention, and those of ordinary skill in the art to which the present invention belongs It is provided to fully inform the person of the scope of the invention, and the scope of the invention is only defined by the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명은 본 발명의 실시예들을 설명함에 있어 실제로 필요한 경우 외에는 생략될 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing the embodiments of the present invention, detailed descriptions of well-known functions or configurations will be omitted unless actually necessary in describing the embodiments of the present invention. In addition, terms to be described later are terms defined in consideration of functions in the embodiment of the present invention, which may vary according to the intention or custom of a user or operator. Therefore, the definition should be made based on the contents throughout this specification.

이하 사용되는 '??부', '??기' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어, 또는, 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Hereinafter, terms such as 'unit' and 'unit' refer to a unit that processes at least one function or operation, and may be implemented by hardware or software or a combination of hardware and software.

이미지의 조작이란 원본 이미지 내 적어도 일부 요소를 변경함으로써 생성된 가공 이미지를 포함시키는 행위를 포함할 수 있고, 원본 이미지 내의 일부 요소가 변경되었다면 이러한 이미지를 조작 이미지라고 칭할 수 있다. 가령, 스플라이싱은 한 이미지의 특정 영역을 복사해서 다른 이미지 위에 붙여 넣는 일반적인 이미지 조작 방식으로, 이미지 조작이 용이하게 이루어질 수 있는 것과는 반대로, 조작된 이미지를 육안으로 판별하는 것이 쉽지 않다. Manipulation of an image may include an act of including a processed image generated by changing at least some elements in the original image, and if some elements in the original image are changed, such an image may be referred to as a manipulated image. For example, splicing is a general image manipulation method in which a specific region of one image is copied and pasted onto another image. Contrary to the fact that image manipulation can be easily performed, it is not easy to distinguish the manipulated image with the naked eye.

본 발명의 실시예는 타겟 이미지의 조작 여부를 판별하기 위해, 특정 이미지의 픽셀 내 압축 흔적을 분석한다는 목적에서부터 시작한다. 일반적으로, 사진이 카메라에서 촬영될 때 손실 압축(ex. JPEG) 기반으로 압축이 한번 일어난다. 이후 조작이 이루어진 뒤 저장될 때, 압축이 한 번 더 발생한다. 이때, 원본 부분은 픽셀값 변경이 없으므로 같은 격자 단위로 한 번 더 압축이 진행되지만, 조작된 부분은 픽셀값이 달라졌으므로 새로운 격자로 압축이 이루어진다고 볼 수 있다. 이처럼, 원본 영역의 픽셀은 같은 격자 단위로 두 번 압축이 진행될 수 있는데, 조작된 부분은 원본 이미지의 생성 과정에서 진행된 압축 과정과 환경이 상이하여, 원본 이미지와 조작 이미지 간의 압축 흔적은 상이하게 나타난다. 일 예로, 타겟 이미지 내 원본 부분과 조작된 부분의 DCT(Discrete Cosine Transform) 공간 정보를 확인하면 다음과 같다. Embodiments of the present invention begin with the purpose of analyzing compression traces within pixels of a specific image in order to determine whether a target image has been manipulated. Typically, compression occurs once, based on lossy compression (e.g. JPEG), when a picture is taken by a camera. When the subsequent operations are performed and saved, compression occurs once more. At this time, the original part is compressed once more in the same lattice unit because there is no change in pixel value, but the manipulated part is compressed in a new lattice because the pixel value is different. In this way, the pixels of the original area can be compressed twice in the same lattice unit, but the manipulated part has a different compression process and environment in the process of generating the original image, so the compression traces between the original image and the manipulated image appear differently. . For example, discrete cosine transform (DCT) spatial information of the original part and the manipulated part in the target image is checked as follows.

도 1은 타겟 이미지 내 원본 부분과 조작된 부분에 대한 DCT 계수 분포를 분석한 결과이다. 1 is a result of analyzing DCT coefficient distributions for an original part and a manipulated part in a target image.

도 1(a)를 참조하면, 이미지에서 녹색 부분은 원본 부분이고, 주황 부분은 다른 이미지로부터 편집되어 붙여 넣어진 조작된 부분이다. 도 1(b)를 참조하면, 원본 부분에서 도출된 DCT 계수의 분포와, 조작된 부분에서 도출된 DCT 계수의 분포가 상이하게 나타남을 확인할 수 있다. Referring to FIG. 1(a), a green part in an image is an original part, and an orange part is a manipulated part edited and pasted from another image. Referring to FIG. 1(b), it can be seen that the distribution of DCT coefficients derived from the original part and the distribution of DCT coefficients derived from the manipulated part appear different.

한편, 현존하는 이미지 판별 딥러닝 알고리즘들을 사용함에 있어, DCT 계수의 분포 자체를 순수하게 입력 데이터로 사용하는 경우, DCT 계수의 비상관(large decorrelation) 성질 때문에 성능이 좋지 않게 나타나고 있다. On the other hand, in using existing image discrimination deep learning algorithms, when the DCT coefficient distribution itself is used purely as input data, performance is poor due to the large decorrelation of DCT coefficients.

이에, 본 발명의 일 실시예에 따른 조작 이미지 판별 장치(100)는 DCT 계수를 이산 볼륨으로 변환하는 방식을 기초로 이미지 판별 딥러닝 알고리즘에 DCT 계수를 활용할 수 있으면서, DCT 계수의 도메인과 RGB 정보의 도메인이 같은 도메인에 위치하도록 하여, 각각으로부터 도출된 특징값을 함께 사용할 수 있는 방식을 제안한다. Therefore, the manipulation image discrimination apparatus 100 according to an embodiment of the present invention can utilize the DCT coefficients in the image discrimination deep learning algorithm based on the method of converting the DCT coefficients into discrete volumes, and the domain and RGB information of the DCT coefficients. We propose a method that allows the domains of to be located in the same domain so that the feature values derived from each can be used together.

도 2는 본 발명의 일 실시예에 따른 조작 이미지 판별 장치(100)의 기능 블록도이다. 본 발명의 일 실시예에 따른 조작 이미지 판별 장치(100)는 하나 이상의 프로세서에 의해 전반적인 동작이 수행될 수 있고, 하나 이상의 프로세서는 도 2에 포함된 기능 블록들이 후술할 동작들을 수행하도록 제어할 수 있다. 2 is a functional block diagram of a manipulation image discrimination device 100 according to an embodiment of the present invention. Overall operations of the manipulation image discrimination apparatus 100 according to an embodiment of the present invention may be performed by one or more processors, and the one or more processors may control functional blocks included in FIG. 2 to perform operations to be described later. there is.

도 2를 참조하면, 본 발명의 일 실시예에 따른 조작 이미지 판별 장치(100)는 저장부(110), 이미지 획득부(120), 이미지 처리부(130), 판별부(140) 및 판별기 생성부(150)를 포함할 수 있다. Referring to FIG. 2 , the apparatus 100 for determining a manipulated image according to an embodiment of the present invention includes a storage unit 110, an image acquisition unit 120, an image processing unit 130, a determination unit 140, and a discriminator generation unit. may include section 150 .

저장부(110)는 본 발명의 일 실시예에 따라 활용되는 각종 데이터를 저장할 수 있다. 예를 들어, 저장부(110)는 타겟 이미지, 학습 이미지, 및 이미지 판별기(200)를 저장할 수 있다. 저장부(110)는 조작 이미지 판별 장치(100) 내부에 하드웨어 형태의 메모리로 구성되거나, 또는 조작 이미지 판별 장치(100) 외부에 위치하는 클라우드 데이터베이스와 연동되는 모듈 형태로 구성될 수 있다. The storage unit 110 may store various data utilized according to an embodiment of the present invention. For example, the storage unit 110 may store a target image, a training image, and the image determiner 200 . The storage unit 110 may be configured as a hardware-type memory inside the manipulated image determining device 100 or may be configured as a module that interworks with a cloud database located outside the manipulated image determining device 100 .

이미지 획득부(120)는 타겟 이미지를 획득할 수 있다. 예를 들어, 이미지 획득부(120)는 외부 입력을 통하거나 또는 저장부(110)에 저장된 데이터의 로딩을 통해 타겟 이미지를 획득할 수 있다. 타겟 이미지는 이미지 내의 조작 여부 판별을 위한 객체일 수 있다. 예를 들어, 타겟 이미지는 JPEG(Joint Photographic Experts Group) 표준에 의해 손실 압축된 JPEG 이미지일 수 있다.The image acquisition unit 120 may obtain a target image. For example, the image acquisition unit 120 may obtain a target image through an external input or through loading of data stored in the storage unit 110 . The target image may be an object for determining manipulation in the image. For example, the target image may be a JPEG image lossy compressed according to the Joint Photographic Experts Group (JPEG) standard.

이미지 처리부(130)는 타겟 이미지의 DCT(Discrete Cosine Transform) 공간 정보를 추출할 수 있다. 예를 들어, DCT 공간 정보는 타겟 이미지의 픽셀별 DCT 계수(ex. Y-channel DCT coefficients) 및 타겟 이미지의 압축에 사용된 양자화 테이블(ex. Y-channel quantization table)을 포함할 수 있다. The image processor 130 may extract DCT (Discrete Cosine Transform) spatial information of the target image. For example, the DCT spatial information may include DCT coefficients (eg, Y-channel DCT coefficients) for each pixel of the target image and a quantization table (eg, Y-channel quantization table) used for compression of the target image.

일 예로, 이미지 처리부(130)는 타겟 이미지의 픽셀별 DCT 계수를 추출하는 경우, 이미지 처리부(130)는 타겟 이미지의 압축에 사용된 양자화 테이블의 가로 픽셀 및 세로 픽셀 개수의 배수 단위로 타겟 이미지를 크롭한 후 픽셀별 DCT 계수를 추출할 수 있다. 이와 같이, 이미지 처리부(130)가 양자화 테이블 크기를 기준으로 크롭하는 이유는 타겟 이미지의 주파수별 DCT 계수를 도출하기 위함이며, 이는 도 7 및 도 8에서 상세하게 설명한다. For example, when the image processing unit 130 extracts DCT coefficients for each pixel of the target image, the image processing unit 130 extracts the target image in units of multiples of the number of horizontal and vertical pixels of the quantization table used for compression of the target image. After cropping, DCT coefficients for each pixel may be extracted. As such, the reason why the image processing unit 130 crops based on the size of the quantization table is to derive DCT coefficients for each frequency of the target image, which will be described in detail with reference to FIGS. 7 and 8 .

일 예로, 이미지 처리부(130)는 타겟 이미지의 압축에 사용된 양자화 테이블을 추출하는 경우, 타겟 이미지에 대한 이미지 파일의 헤더(Header)를 확인하여 양자화 테이블을 획득할 수 있다.For example, when extracting the quantization table used for compression of the target image, the image processing unit 130 may obtain the quantization table by checking a header of an image file for the target image.

이미지 처리부(130)는 타겟 이미지를 구성하는 픽셀의 RGB 정보를 도출할 수 있다. The image processing unit 130 may derive RGB information of pixels constituting the target image.

일 예로, 이미지 처리부(130)는 타겟 이미지를 구성하는 각 픽셀의 R(Red), G(Green), B(Blue)의 고유 색상을 나타내는 RGB 원소값 (x, y, z)으로 구성된 RGB 정보를 도출할 수 있고, R, G, B 각각의 원소로만 구성된 3개의 특징맵을 생성할 수 있다. 만약, 타겟 이미지에 DCT 계수 정보가 없다면, 이미지 처리부(130)는 RGB 픽셀 정보로부터 이산코사인변환을 수행하여 DCT 계수를 직접 계산하고 양자화 테이블의 값을 1로 설정할 수 있다. For example, the image processing unit 130 provides RGB information consisting of RGB element values (x, y, z) representing the unique colors of R (Red), G (Green), and B (Blue) of each pixel constituting the target image. can be derived, and three feature maps composed of only elements of R, G, and B can be generated. If there is no DCT coefficient information in the target image, the image processing unit 130 may directly calculate DCT coefficients by performing discrete cosine transformation from RGB pixel information and set the value of the quantization table to 1.

판별부(140)는 소정의 이미지 판별 딥러닝 알고리즘을 기초로 학습된 이미지 판별기(200)에 타겟 이미지의 DCT 공간 정보 또는 타겟 이미지의 RGB 정보를 입력하여, 타겟 이미지 내에 조작된 픽셀이 포함되었는지 여부를 판별할 수 있다. The determination unit 140 inputs DCT space information or RGB information of the target image to the image discriminator 200 learned based on a predetermined image discrimination deep learning algorithm to determine whether the manipulated pixels are included in the target image. It can be determined whether

도 3은 본 발명의 일 실시예에 따른 이미지 판별기(200)의 신경망 구조를 나타낸 예시도이다. 3 is an exemplary view showing the structure of a neural network of the image discriminator 200 according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 이미지 판별기(200)는 DCT 신경망, RGB 신경망 및 합성 신경망을 포함할 수 있다. 도 3에 예시된 DCT 신경망, RGB 신경망 및 합성 신경망에 대한 실시예들은, 각각 HRNet(High-Resolution Representation Learning Network)의 구조를 사용한 예시이다. 본 발명의 실시예는 이해의 편의를 위해 HRNet을 기반으로 한 네트워크 구조에 본 발명의 실시예의 실시예에 따른 변환 모듈(215)이 적용된 구조를 설명하나, 이외에도 다양한 이미지 판별 딥러닝 알고리즘의 구조에 변환 모듈(215)이 적용된 실시예를 통해 이미지 판별기(200)의 신경망을 설계할 수 있다. Referring to FIG. 3 , the image discriminator 200 according to an embodiment of the present invention may include a DCT neural network, an RGB neural network, and a synthetic neural network. The embodiments of the DCT neural network, the RGB neural network, and the synthetic neural network illustrated in FIG. 3 are each examples using the structure of a High-Resolution Representation Learning Network (HRNet). The embodiment of the present invention describes a structure in which the transformation module 215 according to the embodiment of the present invention is applied to a network structure based on HRNet for convenience of understanding, but in addition to the structure of various image discrimination deep learning algorithms A neural network of the image discriminator 200 may be designed through an embodiment in which the conversion module 215 is applied.

DCT 신경망(210)은 변환 모듈(215) 및 HRNet 기반의 네트워크를 포함할 수 있고, DCT 공간 정보로부터 생성된 특징맵에 대해 합성곱 연산을 수행하여 타겟 이미지의 제1 특징값을 추출하도록 설계될 수 있다. The DCT neural network 210 may include a transform module 215 and a network based on HRNet, and is designed to extract a first feature value of a target image by performing a convolution operation on a feature map generated from DCT spatial information. can

변환 모듈(215)은 DCT 계수를 이산 볼륨으로 변환하는 방식을 기초로 타겟 이미지에 포함된 픽셀별 DCT 계수에 대해 주파수를 기준으로 DCT 계수가 분리된 특징맵을 생성할 수 있다. 이에 따라, 변환 모듈(215)은 이미지 판별 딥러닝 알고리즘에 DCT 계수를 활용할 수 있으면서, DCT 계수의 도메인과 RGB 정보의 도메인이 같은 도메인에 위치하도록 하여, 각각으로부터 도출된 특징값을 함께 사용할 수 있게 한다. 변환 모듈(215)의 구체적인 구조는 도 6 내지 도 8과 함께 상세히 후술한다. The transform module 215 may generate a feature map in which DCT coefficients are separated based on frequencies for DCT coefficients for each pixel included in the target image based on a method of converting DCT coefficients into discrete volumes. Accordingly, the conversion module 215 can utilize the DCT coefficients in the image discrimination deep learning algorithm, and position the DCT coefficient domain and the RGB information domain in the same domain, so that the feature values derived from each can be used together do. A detailed structure of the conversion module 215 will be described later in conjunction with FIGS. 6 to 8 .

RGB 신경망(220)은 HRNet 기반의 네트워크를 포함할 수 있고, 타겟 이미지의 픽셀별 RGB 정보를 기초로 생성된 특징맵에 대해 합성곱 연산을 수행하여 타겟 이미지의 제2 특징값을 추출하도록 설계될 수 있다.The RGB neural network 220 may include an HRNet-based network, and is designed to extract a second feature value of the target image by performing a convolution operation on a feature map generated based on RGB information for each pixel of the target image. can

합성 신경망(230)은 HRNet 기반의 네트워크를 포함할 수 있고, 제1 특징값과 제2 특징값을 입력으로 받아 합성곱 연산을 수행하여 타겟 이미지의 DCT 공간 정보 및 RGB 정보가 함께 반영된 제3 특징값을 추출하도록 설계될 수 있다.The synthetic neural network 230 may include an HRNet-based network, receives the first feature value and the second feature value as inputs and performs a convolution operation to obtain a third feature in which DCT spatial information and RGB information of the target image are reflected together. It can be designed to extract values.

판별기 생성부(150)는 도 3의 구조와 같이 DCT 신경망(210), RGB 신경망(220) 및 합성 신경망(230)이 연결된 구조로 설계된 신경망의 최초 입력단과 최종 출력단의 종단간(end-to-end) 학습을 통해, 입력단에 타겟 이미지의 DCT 계수, 양자화 테이블 및 RGB 정보가 입력되면, 타겟 이미지에 포함된 픽셀의 조작 여부를 판별하도록 학습될 수 있다. 예를 들어, 판별기 생성부(150)는 DCT 공간 정보에 대한 조작 여부의 클래스가 레이블링된 복수의 학습 이미지를 기초로 이미지 판별기(200)를 학습시키거나, 학습 이미지가 포함하는 RGB 정보에 대한 조작 여부의 클래스가 레이블링된 복수의 학습 이미지를 기초로 이미지 판별기(200)를 학습시킬 수 있다. As shown in the structure of FIG. 3, the discriminator generation unit 150 is designed in a structure in which the DCT neural network 210, the RGB neural network 220, and the synthetic neural network 230 are connected, end-to-end (end-to-end) of the first input end and the final output end of the neural network. -end) Through learning, if the DCT coefficient, quantization table, and RGB information of the target image are input to the input terminal, it may be learned to determine whether pixels included in the target image are manipulated. For example, the discriminator generating unit 150 trains the image discriminator 200 based on a plurality of training images labeled with classes of manipulation of DCT spatial information, or based on RGB information included in the training images. The image discriminator 200 may be trained on the basis of a plurality of training images labeled with classes of whether or not manipulation has been performed.

한편, 도 3에 예시된 이미지 판별기(200)는 DCT 신경망, RGB 신경망 및 합성 신경망을 모두 포함하여 함께 학습 및 사용되는 동작을 예시하고 있으나, 본 발명의 또 다른 실시예에 따른 이미지 판별기(200)는 DCT 신경망 또는 RGB 신경망 중 어느 하나의 신경망 단독으로 구성될 수 있다. 판별기 생성부(150)는 단독으로 구성된 신경망의 입력단과 출력단의 종단간 학습을 수행하여, 단독 DCT 신경망 또는 단독 RGB 신경망으로 구성된 이미지 판별기(200)를 생성할 수 있다. On the other hand, the image discriminator 200 illustrated in FIG. 3 includes all of the DCT neural network, the RGB neural network, and the synthetic neural network and exemplifies an operation learned and used together, but the image discriminator according to another embodiment of the present invention ( 200) may be composed of either a DCT neural network or an RGB neural network alone. The discriminator generation unit 150 may generate the image discriminator 200 composed of a single DCT neural network or a single RGB neural network by performing end-to-end learning of an input end and an output end of a single neural network.

판별기 생성부(150)는 상술한 실시예에 따라 학습이 완료된 이미지 판별기(200)를 저장부(110)에 저장할 수 있고, 타겟 이미지의 조작 여부를 판별하는 경우, 판별부(140)는 이미지 처리부(130)가 추출한 DCT 공간 정보 또는 RGB 정보를 저장부(110)에 저장된 이미지 판별기(200)에 입력하여 타겟 이미지의 조작 여부를 판별할 수 있다. The discriminator generation unit 150 may store the image discriminator 200 for which learning has been completed according to the above-described embodiment in the storage unit 110, and when determining whether or not the target image has been manipulated, the discriminator 140 DCT space information or RGB information extracted by the image processing unit 130 may be input to the image determiner 200 stored in the storage unit 110 to determine whether the target image has been manipulated.

도 4는 도 3의 HRNet 구조에 사용되는 합성곱 유닛의 상세 구조에 대한 예시도이고, 도 5는 도 3의 HRNet 구조에 사용되는 퓨전 합성곱 유닛의 구조에 대한 예시도이다. 도 4 및 도 5의 예시는 합성곱 유닛의 구조나 퓨전 합성곱 유닛의 구조에 대한 하나의 예시일 뿐 본 발명의 실시예를 한정하지 것은 아니며, HRNet의 합성곱 유닛의 구조 및 퓨전 합성곱 유닛의 구조는 공지된 구조로서 상세한 설명은 생략한다. 4 is an exemplary diagram of a detailed structure of a convolution unit used in the HRNet structure of FIG. 3, and FIG. 5 is an exemplary diagram of a structure of a fusion convolution unit used in the HRNet structure of FIG. The examples of FIGS. 4 and 5 are only examples of the structure of the convolution unit or the structure of the fusion convolution unit, but do not limit the embodiments of the present invention, and the structure of the convolution unit of HRNet and the fusion convolution unit The structure of is a well-known structure and a detailed description thereof will be omitted.

도 6은 본 발명의 일 실시예에 따른 DCT 신경망의 변환 모듈(215)의 구조에 대한 예시도이다. 6 is an exemplary diagram of the structure of the conversion module 215 of the DCT neural network according to an embodiment of the present invention.

도 6에서 (a)의 점선을 기준으로 상단의 구조부터 설명하기로 한다. The upper structure will be described based on the dotted line of (a) in FIG. 6 .

(a)의 점선의 상단에서, 변환 모듈(215)은 이진 볼륨 변환 레이어(216), 합성곱 레이어 및 주파수별 분리 레이어(217)를 포함하여, DCT 계수를 이산 볼륨으로 변환하는 방식을 기초로 타겟 이미지에 포함된 픽셀별 DCT 계수(20)에 대해 주파수를 기준으로 DCT 계수가 분리된 특징맵(25)을 생성할 수 있다. At the top of the dotted line in (a), the conversion module 215 includes a binary volume conversion layer 216, a convolutional layer and a frequency-by-frequency separation layer 217, based on a scheme for converting DCT coefficients into discrete volumes. For the DCT coefficients 20 for each pixel included in the target image, a feature map 25 in which DCT coefficients are separated based on frequency may be generated.

이산 볼륨 변환 레이어는 타겟 이미지의 2차원 픽셀 위치(i, j)에 포함된 픽셀별 DCT 계수에 대해, 소정 범위의 DCT 계수(이하, 예시에서 소정 범위의 DCT 계수는 자연수 T) 내에서, 0이상 T이하의 순서로 타겟 이미지의 2차원 픽셀 위치(i, j)에 포함된 픽셀별 DCT 계수를 판별하여, 아래 수학식 1에 따라 0 또는 1의 값으로 변환한 별도의 특징맵을 생성할 수 있고, 0부터 T까지 (T+1)회가 진행되면 생성된 (T+1)개의 특징맵을 연결한 특징맵을 생성할 수 있다. The discrete volume transform layer is 0 within a predetermined range of DCT coefficients (hereinafter, in the example, the DCT coefficient in the predetermined range is a natural number T) for each pixel included in the 2-dimensional pixel position (i, j) of the target image. DCT coefficients for each pixel included in the 2D pixel position (i, j) of the target image are determined in the order below T, and a separate feature map converted to a value of 0 or 1 according to Equation 1 below is generated. When (T + 1) times proceed from 0 to T, a feature map connecting the generated (T + 1) feature maps can be generated.

[수학식 1][Equation 1]

즉, 수학식 1에 따르면, 특징맵은

의 변환이 이루어지며, 소정 정수값의 집합 Z로 이루어진 HxW 크기의 타겟 이미지는, {0 또는 1}의 원소값으로 이루어진 H x W 크기의 (T+1)개 특징맵으로 변환될 수 있다. That is, according to Equation 1, the feature map is

The conversion of is performed, and the HxW-sized target image composed of a set Z of predetermined integer values can be converted into (T+1) feature maps of HxW size composed of element values of {0 or 1}.

변환 모듈(215)은 이진 볼륨에 합성곱 연산(ex. 도 6의 예시는 3x3 합성곱, 배치 정규화, ReLu, 1x1 합성곱, 배치 정규화, ReLU 레이어)을 통해 생성된 특징맵(23)을 타겟 이미지에 사용된 양자화 테이블의 사이즈 단위(ex. JPEG의 경우 8x8)로 구분하고, 주파수별 분리 레이어(217-1)를 통해 각 양자화 테이블로 구분된 격자에서 같은 주파수별 성분을 추출한 제1 주파수별 특징맵(25)을 생성할 수 있다. The transformation module 215 targets the feature map 23 generated through a convolution operation (eg, 3x3 convolution, batch normalization, ReLu, 1x1 convolution, batch normalization, ReLU layer in the example of FIG. 6) on a binary volume. The first frequency-specific component is divided into size units of the quantization table used in the image (ex. 8x8 in case of JPEG), and the components for each frequency are extracted from the lattice divided into each quantization table through the frequency-specific separation layer 217-1. A feature map 25 may be created.

도 7 및 도 8은 본 발명의 일 실시예에 따른 주파수별 분리 레이어(217)의 동작을 설명하기 위한 예시도이다. 7 and 8 are exemplary diagrams for explaining the operation of the frequency-by-frequency separation layer 217 according to an embodiment of the present invention.

도 7을 참조할 때, 특징맵의 크기가 64x64 픽셀의 크기이고, 양자화 테이블의 크기가 8x8의 크기라면, 주파수별 분리 레이어(217)는 도 6(a)에 도시된 그림과 같이 64x64 픽셀의 특징맵을 양자화 테이블 사이즈 8x8 단위로 분리할 수 있다. 즉, 가로 픽셀의 관점에서 (특징맵의 가로픽셀 64개)/(양자화 테이블 가로크기 8개) =8개로 구분할 수 있으며, 세로 픽셀의 관점에서 (특징맵의 세로픽셀 64개)/(양자화 테이블 세로크기 8개)=8개로 구분할 수 있다. 즉, 주파수별 분리 레이어(217)는 64x64 픽셀 크기의 특징맵을 도 7과 같이 하나의 격자(ex. 하나의 격자 a1, 하나의 격자 a2)당 8x8 픽셀의 사이즈를 갖는 64개의 격자가 되도록 분리할 수 있다. Referring to FIG. 7, if the size of the feature map is 64x64 pixels and the size of the quantization table is 8x8, the frequency-specific separation layer 217 has 64x64 pixels as shown in FIG. The feature map can be divided into units of quantization table size 8x8. That is, in terms of horizontal pixels, (64 horizontal pixels of feature map)/(8 horizontal size of quantization table) = 8, and in terms of vertical pixels (64 vertical pixels of feature map)/(quantization table) 8 vertical size) = 8 can be distinguished. That is, the frequency-specific separation layer 217 separates a feature map with a size of 64x64 pixels into 64 grids each having a size of 8x8 pixels per grid (eg, one grid a1 and one grid a2) as shown in FIG. can do.

도 8을 참조할 때, 주파수별 분리 레이어(217)는 아래 수학식 2에 따라 상술한 도 7의 동작에 따라 분리된 64개의 모든 격자에서 (0, 0)의 좌표에 해당하는 64개의 원소값을 모아 하나의 특징맵(A)을 생성할 수 있다. Referring to FIG. 8, the frequency-specific separation layer 217 has 64 element values corresponding to the coordinates of (0, 0) in all 64 grids separated according to the operation of FIG. 7 described above according to Equation 2 below. It is possible to create one feature map (A) by collecting them.

[수학식 2][Equation 2]

(c: 특징맵의 개수를 의미하는 인덱스, i: 특징맵의 가로 픽셀 인덱스, j: 특징맵의 세로 픽셀 인덱스)(c: index indicating the number of feature maps, i: horizontal pixel index of feature map, j: vertical pixel index of feature map)

도 8의 경우 주파수 분리 전 특징맵은 1개이므로 c={0}, i={0, 1, 2, 3, 4, 5, 6, 7}, j={0, 1, 2, 3, 4, 5, 6, 7}의 인덱스를 포함할 수 있다. 예를 들어, 주파수별 분리 레이어(217)는 0번째 특징맵의 좌측 최상단 격자의 (0, 0) 원소값 a1, 다음 격자의 (0, 0) 원소값 a2의 순서로 원소값을 도출하는 방식으로, 좌표가 (0, 0) 부터 (7, 7)이 될 때까지 해당 동작을 수행하여 주파수별 DCT 계수가 도출된 64개의 특징맵으로 차원을 변환한 주파수별 특징맵을 생성할 수 있다. In the case of FIG. 8, since there is only one feature map before frequency separation, c={0}, i={0, 1, 2, 3, 4, 5, 6, 7}, j={0, 1, 2, 3, 4, 5, 6, 7}. For example, the frequency-specific separation layer 217 derives element values in the order of (0, 0) element value a1 of the upper left lattice of the 0th feature map and (0, 0) element value a2 of the next lattice. , it is possible to generate a feature map for each frequency by performing a corresponding operation until the coordinates are from (0, 0) to (7, 7), and converting the dimension into 64 feature maps from which DCT coefficients for each frequency are derived.

즉, 수학식 2에 따르면, 특징맵은

의 차원 변환이 이루어지며, 주파수별 분리 레이어(217)는 소정 정수값의 집합 R로 이루어진

차원의 특징맵을

차원의 특징맵으로 변환하여 주파수 분리를 수행할 수 있다. That is, according to Equation 2, the feature map is

The dimensional transformation of is performed, and the frequency-specific separation layer 217 consists of a set R of predetermined integer values.

dimensional feature map

Frequency separation can be performed by transforming into a dimensional feature map.

도 6에서 (a)의 점선을 기준으로 하단의 구조와 최종으로 생성된 특징맵을 설명한다. Referring to the dotted line in (a) in FIG. 6, the lower structure and finally generated feature map will be described.

(a)의 점선의 하단에서, 변환 모듈(215)은 블록 배열 레이어(218)를 통해 양자화 테이블(30)을 반복 배열하여 이진 볼륨(21)으로부터 생성된 특징맵(23)과 같은 사이즈가 되도록 생성한 블록 배열 특징맵(31)과 이진 볼륨(21)으로부터 생성된 특징맵(23)에 대해 원소별 곱셈 이후, 주파수별 분리 레이어(217-2)을 통해 같은 주파수별 성분을 추출한 제2 주파수별 특징맵(35)과 제1 주파수별 특징맵(25)을 접합하여, 타겟 이미지에 포함된 픽셀별 DCT 계수에 대해 주파수를 기준으로 DCT 계수가 분리된 특징맵(27)을 생성할 수 있다. At the bottom of the dotted line in (a), the transform module 215 iteratively arranges the quantization table 30 through the block arrangement layer 218 to have the same size as the feature map 23 generated from the binary volume 21. After element-by-element multiplication of the block array feature map 31 and the feature map 23 generated from the binary volume 21, the second frequency components are extracted through the frequency-specific separation layer 217-2. A feature map 27 in which DCT coefficients are separated based on frequency may be generated from the DCT coefficients for each pixel included in the target image by combining the feature map 35 for each feature and the first feature map 25 for each frequency. .

이에 따라, 변환 모듈(215)은 타겟 이미지를 양자화 테이블의 격자 크기에 기초하여 크롭된 픽셀별 DCT 계수(20)를 이산 볼륨으로 변환하고, 주파수별 DCT 계수를 도출한 특징맵을 생성하므로, DCT 계수의 도메인과 RGB 정보의 도메인이 같은 도메인에 위치하도록 할 수 있다. Accordingly, the transform module 215 transforms the DCT coefficients 20 for each pixel cropped based on the grid size of the quantization table of the target image into a discrete volume, and generates a feature map derived from DCT coefficients for each frequency. The domain of coefficients and the domain of RGB information can be located in the same domain.

도 9는 본 발명의 일 실시예에 따른 조작 이미지 판별 방법의 흐름도이다. 9 is a flowchart of a method for determining a manipulated image according to an embodiment of the present invention.

도 9에 따른 조작 이미지 판별 방법의 각 단계는 도 2를 통해 설명된 조작 이미지 판별 장치(100)에 의해 수행될 수 있으며, 각 단계를 설명하면 다음과 같다.Each step of the manipulation image discrimination method according to FIG. 9 may be performed by the manipulation image discrimination device 100 described with reference to FIG. 2 , and each step is described as follows.

S910 단계에서, 이미지 획득부(120)는 타겟 이미지를 획득할 수 있다 In step S910, the image acquiring unit 120 may obtain a target image.

S920 단계에서, 이미지 처리부(130)는 타겟 이미지의 DCT 공간 정보를 추출할 수 있다.In step S920, the image processing unit 130 may extract DCT spatial information of the target image.

S930 단계에서, 판별부(140)는 학습 이미지가 포함하는 DCT 공간 정보에 대한 조작 여부의 클래스가 레이블링된 복수의 학습 이미지를 기초로 학습된 이미지 판별기(200)에 타겟 이미지의 DCT 공간 정보를 입력하여 타겟 이미지 내에 조작된 픽셀이 포함되었는지 여부를 판별할 수 있다.In step S930, the discriminating unit 140 transmits the DCT spatial information of the target image to the learned image discriminator 200 based on a plurality of training images labeled with classes of manipulation of DCT spatial information included in the training image. input to determine whether or not the manipulated pixel is included in the target image.

한편, 상술한 각 단계의 주체인 구성 요소들이 해당 단계를 실시하기 위한 동작은 도 2 내지 도 8에서 설명하였으므로 중복된 설명은 생략한다.On the other hand, since the operation for the constituent elements, which are the subject of each step described above, to perform the corresponding step has been described in FIGS. 2 to 8, duplicate descriptions will be omitted.

도 10은 각각 입력 이미지에 대한 정답 마스크, 본 발명의 실시예에 따른 조작 판별 결과, ManTra-Net에 따른 조작 판별 결과, 및 EXIF-SC에 따른 조작 판별 결과를 나타낸 도면이다.10 is a diagram showing a correct answer mask for an input image, a manipulation discrimination result according to an embodiment of the present invention, a manipulation discrimination result according to ManTra-Net, and a manipulation discrimination result according to EXIF-SC, respectively.

도 10을 참조하면, 기존에 사용되는 ManTra-Net 및 EXIF-SC의 방법에 비해, 본 발명의 실시예를 사용하는 경우 원본 이미지와 조작 이미지를 구분하는 정확도가 대폭 향상될 수 있음을 확인할 수 있다. 즉, 본 발명의 실시예는 타겟 이미지를 양자화 테이블의 격자 크기에 기초하여 크롭하고, 이를 기반으로 DCT 계수를 이산 볼륨으로 변환하는 방식을 통해, 이미지 판별 딥러닝 알고리즘에 주파수별 DCT 계수를 활용할 수 있게 하고, DCT 계수의 도메인과 RGB 정보의 도메인이 같은 도메인에 위치하도록 함으로써, 높은 성능으로 타겟 이미지의 압축 흔적을 분석하여 타겟 이미지의 조작 여부를 판단할 수 있다. Referring to FIG. 10, it can be seen that the accuracy of distinguishing the original image from the manipulated image can be significantly improved when using the embodiment of the present invention, compared to the previously used methods of ManTra-Net and EXIF-SC. . That is, the embodiment of the present invention can utilize frequency-specific DCT coefficients in the image discrimination deep learning algorithm by cropping the target image based on the grid size of the quantization table and converting the DCT coefficients into discrete volumes based on this. By making the DCT coefficient domain and the RGB information domain located in the same domain, it is possible to determine whether the target image has been manipulated by analyzing the compression trace of the target image with high performance.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 품질에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 명세서에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an example of the technical idea of the present invention, and various modifications and variations can be made to those skilled in the art without departing from the essential qualities of the present invention. Therefore, the embodiments disclosed herein are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

100: 조작 이미지 판별 장치 110: 저장부 120: 이미지 획득부
130: 이미지 처리부 140: 판별부 150: 판별기 생성부
200: 이미지 판별기 210: DCT 신경망 220: RGB 신경망
230: 합성 신경망 200: 이미지 판별기
215: 변환 모듈 216: 이진 볼륨 변환 레이어
217: 주파수별 분리 레이어 218: 블록 배열 레이어100: manipulated image determination device 110: storage unit 120: image acquisition unit
130: image processing unit 140: determination unit 150: discriminator generation unit
200: image discriminator 210: DCT neural network 220: RGB neural network
230: synthetic neural network 200: image discriminator
215: transformation module 216: binary volume transformation layer
217: separation layer by frequency 218: block array layer

Claims

an image acquiring unit acquiring a target image;
an image processor extracting DCT (Discrete Cosine Transform) spatial information of the target image; and
Whether or not a manipulated pixel is included in the target image by inputting the DCT spatial information of the target image to an image discriminator based on a plurality of training images labeled with a class of manipulation of DCT spatial information included in the training image. Including a discriminating unit for discriminating
Manipulated image discrimination device.

According to claim 1,
The DCT spatial information,
Including a DCT coefficient for each pixel of the target image and a quantization table used for compression of the target image
Manipulated image discrimination device.

According to claim 1,
The image processing unit,
Extracting the DCT spatial information after cropping the target image in units of multiples of the number of horizontal pixels and vertical pixels of a quantization table used for compression of the target image
Manipulated image discrimination device.

According to claim 1,
The image discriminator,
Based on a transformation module that generates a feature map in which DCT coefficients are separated on the basis of frequency for DCT coefficients for each pixel included in the target image, a convolution operation is performed on the feature map generated by the transformation module from the DCT spatial information. Including a DCT neural network that performs and extracts a first feature value of the target image
Manipulated image discrimination device.

According to claim 4,
The conversion module,
A separate feature map in which binary values are assigned to pixels having DCT coefficients of the same value within a predetermined range of DCT coefficients for each pixel included in the target image is generated, and the generated separate feature maps are connected. Including a binary volume conversion layer that creates a binary volume,
Manipulated image discrimination device.

According to claim 5,
The conversion module,
A frequency-specific separation layer for dividing the feature map generated from the binary volume into units of the size of the quantization table used in the target image and generating a first frequency-specific feature map obtained by extracting the same component for each frequency of each quantization table.
Manipulated image discrimination device.

According to claim 6,
The conversion module,
A block array feature map generated by connecting the quantization table so as to have a size corresponding to the feature map generated from the binary volume and a feature map generated from the binary volume are multiplied element by element, and components for each frequency are extracted. Generating a feature map in which DCT coefficients are separated based on frequency from the DCT coefficients for each pixel included in the target image by joining the feature map for each frequency and the first feature map for each frequency.
Manipulated image discrimination device.

According to claim 5,
The binary volume conversion layer,
For the DCT coefficient included in the 2-dimensional pixel position (i, j) of the target image, a separate feature converted to a value of 0 or 1 according to Equation 1 below in an order of 0 or more and T (predetermined natural number) or less Creating a map and connecting the generated (T + 1) feature maps
[Equation 1]

(M: DCT coefficient for each pixel included in the target image, clip(M)ij is converted to T if the DCT coefficient of coordinates (i, j) among the DCT coefficients included in M is greater than T, and is less than -T - A function that converts to T, abs() is an absolute value function)
Manipulated image discrimination device.

According to claim 4,
The image discriminator,
an RGB neural network generating a second feature value of the target image by performing a convolution operation on a feature map generated based on RGB information for each pixel of the target image; and
Further comprising a synthetic neural network that receives the first feature value and the second feature value as inputs and performs a convolution operation to generate a third feature value in which DCT information and RGB information of the target image are reflected together.
Manipulated image discrimination device.

In the manipulation image discrimination method performed by the manipulation image discrimination device,
acquiring a target image;
extracting discrete cosine transform (DCT) spatial information of the target image; and
Whether or not a manipulated pixel is included in the target image by inputting DCT spatial information of the target image to an image discriminator based on a plurality of training images labeled with a class of manipulation of DCT spatial information included in the training image. Including the step of determining
How to determine tampered images.

According to claim 10,
The DCT spatial information,
Including a DCT coefficient for each pixel of the target image and a quantization table used for compression of the target image
How to determine tampered images.

According to claim 10,
The step of extracting DCT spatial information of the target image,
Extracting the DCT spatial information after cropping the target image in units of multiples of the number of horizontal pixels and vertical pixels of a quantization table used for compression of the target image
How to determine tampered images.

According to claim 10,
The image discriminator,
Based on a transformation module that generates a feature map in which DCT coefficients are separated on the basis of frequency for DCT coefficients for each pixel included in the target image, a convolution operation is performed on the feature map generated by the transformation module from the DCT spatial information. Including a DCT neural network that performs and extracts a first feature value of the target image
How to determine tampered images.

According to claim 13,
The conversion module,
A separate feature map is created in which binary values are assigned to pixels having DCT coefficients of the same value within a predetermined range of DCT coefficients for each pixel included in the target image, and the generated separate feature maps are connected. A binary volume transformation layer that creates a binary volume
How to determine tampered images.

According to claim 14,
The conversion module,
A frequency-specific separation layer for dividing the feature map generated from the binary volume into units of the size of the quantization table used in the target image and generating a first frequency-specific feature map obtained by extracting the same component for each frequency of each quantization table.
How to determine tampered images.

According to claim 13,
The image discriminator,
an RGB neural network generating a second feature value of the target image by performing a convolution operation on a feature map generated based on RGB information for each pixel of the target image; and
Further comprising a synthetic neural network that receives the first feature value and the second feature value as inputs and performs a convolution operation to generate a third feature value in which DCT information and RGB information of the target image are reflected together.
How to determine tampered images.

A computer-readable recording medium storing a computer program,
acquiring a target image;
extracting discrete cosine transform (DCT) spatial information of the target image; and
Whether or not a manipulated pixel is included in the target image by inputting DCT spatial information of the target image to an image discriminator based on a plurality of training images labeled with a class of manipulation of DCT spatial information included in the training image. Including the step of determining
Including instructions for causing the processor to perform the manipulation image discrimination method
A computer-readable recording medium.

As a computer program stored on a computer-readable recording medium,
acquiring a target image;
extracting discrete cosine transform (DCT) spatial information of the target image; and
Whether or not a manipulated pixel is included in the target image by inputting DCT spatial information of the target image to an image discriminator based on a plurality of training images labeled with a class of manipulation of DCT spatial information included in the training image. Including the step of determining
Including instructions for causing the processor to perform the manipulation image discrimination method
computer program.