KR102651594B1

KR102651594B1 - Content security system for securing multimedia files

Info

Publication number: KR102651594B1
Application number: KR1020230137129A
Authority: KR
Inventors: 한진구
Original assignee: 주식회사 바론아이티
Priority date: 2023-10-13
Filing date: 2023-10-13
Publication date: 2024-03-26

Abstract

본 발명은 수신처 및 발신처로부터 멀티미디어 파일을 수신하고, 수신된 멀티미디어 파일에 대하여 설정된 보안정책을 적용하여 보안작업을 수행한 후 수신자에게 발신하는 멀티미디어 파일 보안용 콘텐츠 보안 시스템에 관한 것이다.The present invention relates to a content security system for multimedia file security that receives multimedia files from a recipient and a sender, performs security operations by applying a security policy set to the received multimedia files, and then transmits them to the recipient.

Description

Content security system for securing multimedia files}

본 발명은 멀티미디어 파일 보안용 콘텐츠 보안 시스템에 관한 것으로, 더욱 상세하게는 수신처 및 발신처로부터 멀티미디어 파일을 수신하고, 수신된 멀티미디어 파일에 대하여 설정된 보안정책을 적용하여 보안작업을 수행한 후 수신자에게 발신하는 멀티미디어 파일 보안용 콘텐츠 보안 시스템에 관한 것이다.The present invention relates to a content security system for multimedia file security. More specifically, the present invention relates to a content security system for multimedia file security, and more specifically, to receive multimedia files from a recipient and a sender, perform security operations by applying a set security policy to the received multimedia files, and then transmit them to the recipient. This relates to a content security system for multimedia file security.

근래에는 업무의 효율성을 위해 대부분의 기업들이 사내 네트워크를 구축하고 있다.Nowadays, most companies are building in-house networks for work efficiency.

사내 네트워크는 다수의 이용자에게 빈번하게 사용되기 때문에, 사용자에게 필요한 액티브 컨텐트 외에 악의적으로 심어진 바이러스나 랜섬웨어와 같은 각종 유해한 액티브 컨텐트나 코드 등(이하 '유해한 액티브 컨텐트'라통칭함)은 사내 네트워크 내에서 매우 빠른 속도로 전염된다.Since the in-house network is frequently used by a large number of users, in addition to the active content required by users, various harmful active content or codes such as maliciously planted viruses or ransomware (hereinafter collectively referred to as 'harmful active content') are stored within the in-house network. It is transmitted at a very rapid rate.

그래서 사내 네트워크를 구축한 기업들은 유해한 액티브 컨텐트로부터 사내 네트워크를 보호하기 위한 각종 보안 시스템을 고려한다.Therefore, companies that have built in-house networks consider various security systems to protect the in-house network from harmful active content.

액티브 컨텐트들은 종종 사용자들에게 필요한 정보를 담은 멀티미디어 파일(예를 들면 워드파일, PDF문서, 그림파일, PPT파일, 엑셀파일, 이미지파일 등)에 심어진 상태로 유통되면서 신속히 확산될 수 있다. 그래서 멀티미디어 파일에 대한 보안 시스템의 도입이 필요하다.Active content can spread quickly by being distributed in a multimedia file (e.g. Word file, PDF document, picture file, PPT file, Excel file, image file, etc.) containing information needed by users. Therefore, it is necessary to introduce a security system for multimedia files.

멀티미디어 파일은 주로 이메일, 웹사이트 다운로드, USB메모리와 같은 기록매체, 사내 네트워크에 접속되어 있는 컴퓨터(개인용 PC나 스마트폰 또는 서버용 컴퓨터를 모두 포함함)로부터 유입될 수 있다. 이에 따라서 멀티미디어 파일의 수신 경로마다 최적화된 다양한 보안수단들이 구축되어 있다. 예를 들어, 이메일에 의한 수신 경로에는 스팸메일이나 악성코드를 가지는 메일 등을 처리하는 메일 보안수단이 구비되고, 웹사이트 다운로드에 따른 수신 경로에는 백신이 동작하며, 기록매체에 의한 수신 경로에는 매체제어수단이 구비된다. 그리고 컴퓨터간 수신 경로에는 망분리수단이 구비될 수 있다.Multimedia files can mainly come from e-mail, website downloads, recording media such as USB memory, and computers connected to the company network (including personal PCs, smartphones, and server computers). Accordingly, various security measures optimized for each reception path of multimedia files have been established. For example, the reception path through email is equipped with email security measures to handle spam mail or emails containing malicious code, the reception path through website downloads is equipped with an antivirus, and the reception path through recording media is equipped with media security measures. Control means are provided. Additionally, a network separation means may be provided in the receiving path between computers.

멀티미디어 파일에 액티브 컨텐트가 심어진 경우에, 백신은 액티브 컨텐트의 패턴매칭을 통해 해당 액티브 컨텐트의 유해성 여부를 확인하였으나, 이러한 방법의 경우 변종이나 신종 악성 프로그램을 가려낼 수는 없었다.When active content is embedded in a multimedia file, the antivirus checks whether the active content is harmful through pattern matching of the active content, but this method cannot detect variants or new malicious programs.

또한, 멀티미디어 파일에 직접적으로 악성 프로그램이 포함되어 있지 않고 매크로 등을 이용하여 사용자가 해당 멀티미디어 파일을 열람할 시, 매크로에 의해 악성 프로그램을 다운로드하여 실행하게 되면 패턴 기반으로 해당 멀티미디어 파일의 유해성을 탐지하기가 어렵고, 이러한 공격은 매크로 코드의 조작만으로 새로운 형태가 되기 때문에 지속적으로 신규 위협으로 기능하므로, 패턴매칭만으로는 대응하기가 어렵다.In addition, when a multimedia file does not directly contain a malicious program and the user views the multimedia file using a macro, etc., if the malicious program is downloaded and executed by the macro, the harmfulness of the multimedia file is detected based on the pattern. It is difficult to do, and these attacks continuously function as new threats because they take on new forms just by manipulating macro code, so it is difficult to respond with pattern matching alone.

그래서 샌드박스(SandBox)라는 수단이 등장하였다. 샌드박스는 멀티미디어 파일에 심어진 액티브 컨텐트를 보호된 영역의 가상 환경에서 동작 여부 등을 수행시킴으로써 액티브 컨텐트의 행위를 통해 해당 액티브 컨텐트가 유해한지 여부를 확인하였다. 그런데, 근래에 개발되는 유해한 액티브 컨텐트들은 가상 환경에서는 동작하지 않고 실제 환경에서만 동작하도록 프로그램화되어 있어서, 샌드박스로는 멀티미디어 파일에 심어진 악성 프로그램으로부터 사내 네트워크를 보호할 수 없는 상태에 이르렀다. 즉, 지능화된 악성코드는 언제 가상 환경이 실행중인지를 판단하고, 이에 따라 그들의 공격 행위를 멈춰 탐지를 피할 수 있는 방식이나 필수 프로그램(예를 들어 Java)의 취약점을 악용하는 방식으로 점차 진화해 가고 있는 것이다. 또한, 가상 환경이라 동작 수행의 시간이 다소 걸린다는 점에서, 샌드박스는 단독 보안 솔루션이 아닌 기존 보안 솔루션을 보완하는 정도로만 사용된다.So a tool called SandBox appeared. The sandbox determines whether the active content embedded in the multimedia file operates in a virtual environment in a protected area to determine whether the active content is harmful through its behavior. However, harmful active content developed recently is programmed to operate only in the real environment and not in the virtual environment, so the sandbox has reached a state where it is no longer possible to protect the company network from malicious programs planted in multimedia files. In other words, intelligent malware gradually evolves by determining when a virtual environment is running, stopping its attack actions accordingly to avoid detection, or exploiting vulnerabilities in essential programs (e.g. Java). There is. In addition, since it is a virtual environment and it takes some time to perform operations, the sandbox is not used as a standalone security solution, but only to the extent of complementing existing security solutions.

그런데, 멀티미디어 파일들 중 몇몇(예를 들면 엑셀파일이나 PPT파일 등)은 열람이나 사용상의 편의를 위해 작성자의 필요에 의해서 액티브 컨텐트(매크로, 스크립트, OLE Object 등 멀티미디어 파일 내에 포함되어 있는 실행 가능한 기능, 요소 등)들이 심어지도록 되어 있는데, CDR에 의해 사용자에게 필요한 액티브 컨텐트들이 삭제되면 사용자는 해당 멀티미디어 파일을 적절히 열람하거나 사용할 수 없게 되어버린다.However, some of the multimedia files (e.g. Excel files, PPT files, etc.) have active content (executable functions included in the multimedia file such as macros, scripts, OLE Objects, etc.) depending on the creator's needs for convenience of viewing or use. , elements, etc.) are installed, but if the active content needed by the user is deleted by CDR, the user will not be able to properly view or use the multimedia file.

그리고 멀티미디어 파일의 다양한 수신경로 별로 해당 멀티미디어 파일이 백신이나 CDR 등과 같은 다양한 보안수단을 선택적으로 거치기 때문에 수신경로 별로 멀티미디어 파일에 대한 불완전한 보안만이 실행된다. 이에 따라 컨텐트 관리나 컨텐트 보안 관리가 복잡해짐으로써 보안이 취약해져서 사내 네트워크가 치명적이고 유해한 액티브 컨텐트의 침입에 노출될 개연성이 크다.And because the multimedia files selectively go through various security means such as antivirus or CDR for each of the various receiving channels of the multimedia files, only incomplete security for the multimedia files is implemented for each receiving channel. Accordingly, as content management and content security management become more complex, security becomes vulnerable, making it highly likely that the company's network will be exposed to intrusions by deadly and harmful active content.

한편, 전술한 배경 기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.Meanwhile, the above-mentioned background technology is technical information that the inventor possessed for deriving the present invention or acquired in the process of deriving the present invention, and cannot necessarily be said to be known technology disclosed to the general public before filing the application for the present invention. .

한국등록특허 제10-2262680호Korean Patent No. 10-2262680

본 발명의 일측면은 수신처 및 발신처로부터 멀티미디어 파일을 수신하고, 수신된 멀티미디어 파일에 대하여 설정된 보안정책을 적용하여 보안작업을 수행한 후 수신자에게 발신하는 멀티미디어 파일 보안용 콘텐츠 보안 시스템을 제공한다.One aspect of the present invention provides a content security system for securing multimedia files, which receives multimedia files from a recipient and a sender, performs security operations by applying a set security policy to the received multimedia files, and then transmits them to the recipient.

본 발명의 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The technical problem of the present invention is not limited to the technical problem mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the description below.

본 발명의 일 실시예에 따른 멀티미디어 파일 보안용 콘텐츠 보안 시스템은 수신처 및 발신처로부터 멀티미디어 파일을 수신하고, 수신된 멀티미디어 파일에 대하여 설정된 보안정책을 적용하여 보안작업을 수행한 후 수신자에게 발신한다.A content security system for multimedia file security according to an embodiment of the present invention receives multimedia files from a recipient and a sender, performs security operations by applying a set security policy to the received multimedia files, and then transmits them to the recipient.

상기 멀티미디어 파일 보안용 콘텐츠 보안 시스템은 The content security system for multimedia file security is

다양한 수신처 또는 발신처에서 오는 경로에 구비된 보안수단으로부터 멀티미디어 파일과 수발신 정보를 수신하는 수신부;A receiving unit that receives multimedia files and incoming/outgoing information from security means provided on paths coming from various receiving or transmitting sources;

상기 수신수단에 의해 수신된 해당 멀티미디어 파일에 대하여 설정된 보안정책에 따라서 해당 멀티미디어 파일에 적용할 보안작업을 선택하는 선택부;a selection unit for selecting a security task to be applied to the multimedia file according to a security policy set for the multimedia file received by the receiving means;

상기 선택수단에 의해 선택된 보안작업을 해당 멀티미디어 파일에 대하여 수행하는 적어도 하나의 보안부;at least one security unit that performs a security operation selected by the selection means on the corresponding multimedia file;

상기 선택수단에 의해 선택된 보안작업을 해당 멀티미디어 파일에 대하여 수행하도록 상기 보안수단을 실행시키는 실행부; 및an executing unit that executes the security means to perform a security operation selected by the selection means on the corresponding multimedia file; and

상기 보안수단에 의해 보안작업이 실행된 해당 멀티미디어 파일을 상기한 다양한 수신처 또는 발신처에서 오는 경로에 구비된 보안수단으로 발신하는 발신부;를 포함한다.It includes a sending unit that transmits the multimedia file for which a security operation has been performed by the security means to a security means provided on a path from the various recipients or senders.

상기 보안부는,The security department said,

하기 수학식을 이용하여 멀티미디어 파일별로 보안요구점수를 산출하여, 산출된 보안요구점수가 미리 설정된 기준값 이상인 멀티미디어 파일만 선택적으로 암호화한다.The security requirement score is calculated for each multimedia file using the following equation, and only multimedia files whose calculated security requirement score is higher than a preset standard value are selectively encrypted.

[수학식][Equation]

여기서, S는 보안요구점수, w_p는 멀티미디어 파일의 종류별로 차등하게 설정된 제1 가중치, g는 멀티미디어 파일의 크기에 비례하여 설정되는 제2 가중치, v_s는 멀티미디어 파일의 종류별로 설정되는 기준값, v_k는 보안요구점수 산출이 요구되는 멀티미디어 파일에 설정된 키워드의 임베딩 벡터의 크기값, v_a는 보안요구점수 산출이 요구되는 멀티미디어 파일과 동일한 카테고리에 포함된 다른 멀티미디어 파일 각각에 설정된 키워드의 임베딩 벡터들의 평균 크기값, r은 보안요구점수 산출이 요구되는 멀티미디어 파일과 동일한 카테고리에 포함된 파일의 개수, tr은 멀티미디어 파일을 전송한 수신자 단말별로 설정되는 가중치이다.Here, S is the security requirement score, w_p is the first weight set differently for each type of multimedia file, g is the second weight set in proportion to the size of the multimedia file, v_s is the reference value set for each type of multimedia file, and v_k is The size value of the keyword embedding vector set in the multimedia file requiring security requirement calculation, v_a is the average size value of the keyword embedding vectors set in each of the other multimedia files included in the same category as the multimedia file requiring security requirement calculation. , r is the number of files included in the same category as the multimedia file for which security score calculation is required, and tr is the weight set for each recipient terminal that transmitted the multimedia file.

상술한 본 발명의 일측면에 따르면 다양한 경로로 수신되는 밀티미디어 파일에 대한 보안작업을 수행함으로써 콘텐츠 보안의 신뢰성이 향상될 수 있고, 설정된 보안정책에 따라 멀티미디어 파일의 보안작업 선택적으로 수행함으로써 보안에 요구되는 리소스 및 시간을 단축시킬 수 있다.According to one aspect of the present invention described above, the reliability of content security can be improved by performing security operations on multimedia files received through various paths, and security can be improved by selectively performing security operations on multimedia files according to a set security policy. Required resources and time can be reduced.

도 1은 본 발명의 일 실시예에 멀티미디어 파일 보안용 콘텐츠 보안 시스템의 개략적인 구성이 도시된 도면이다.1 is a diagram illustrating a schematic configuration of a content security system for multimedia file security according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예와 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The detailed description of the present invention described below refers to the accompanying drawings, which show by way of example specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different from one another but are not necessarily mutually exclusive. For example, specific shapes, structures and characteristics described herein may be implemented in one embodiment without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the detailed description that follows is not intended to be taken in a limiting sense, and the scope of the invention is limited only by the appended claims, together with all equivalents to what those claims assert, if properly described. Similar reference numbers in the drawings refer to identical or similar functions across various aspects.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 멀티미디어 파일 보안용 콘텐츠 보안 시스템의 개략적인 구성이 도시된 도면이다.Figure 1 is a diagram illustrating a schematic configuration of a content security system for multimedia file security according to an embodiment of the present invention.

본 발명에 따른 멀티미디어 파일 보안용 콘텐츠 보안 시스템은 수신처 및 발신처로부터 멀티미디어 파일을 수신하고, 수신된 멀티미디어 파일에 대하여 설정된 보안정책을 적용하여 보안작업을 수행한 후 수신자에게 발신하는 것을 특징으로 한다.The content security system for multimedia file security according to the present invention is characterized in that it receives multimedia files from a recipient and a sender, performs security operations by applying a set security policy to the received multimedia files, and then transmits them to the recipient.

이를 위해, 본 발명에 따른 멀티미디어 파일 보안용 콘텐츠 보안 시스템은 파일관리서버에 구현되는 것을 특징으로 한다.To this end, the content security system for multimedia file security according to the present invention is characterized by being implemented in a file management server.

구체적으로, 파일관리서버에 구현된 본 발명의 일 실시예에 따른 멀티미디어 파일 보안용 콘텐츠 보안 시스템은 수신부(110), 선택부(120), 보안부(130), 실행부(140) 및 발신부(150)를 포함한다.Specifically, the content security system for multimedia file security according to an embodiment of the present invention implemented in a file management server includes a receiving unit 110, a selection unit 120, a security unit 130, an execution unit 140, and a sending unit ( 150).

수신부(110)는 각각의 다양한 수신처로부터 오는 경로에 구비된 메일 보안수단, 백신, 매체제어수단 및 망분리수단으로부터 멀티미디어 파일과 수신/발신 정보를 수신한다.The receiving unit 110 receives multimedia files and incoming/outgoing information from mail security means, antivirus, media control means, and network separation means provided in the path coming from each of the various receiving destinations.

선택부(120)는 수신부(110)에 의해 수신된 멀티미디어 파일에 대하여 설정된 보안정책에 따라서 해당 멀티미디어 파일에 적용할 보안작업을 선택하고, 해당 멀티미디어 파일에 대한 처리를 결정한다.The selection unit 120 selects a security task to be applied to the multimedia file according to a security policy set for the multimedia file received by the receiving unit 110 and determines processing for the multimedia file.

보안부는 선택부에 의해 선택된 보안작업을 해당 멀티미디어 파일에 대하여 수행한다.The security unit performs the security task selected by the selection unit on the corresponding multimedia file.

일 실시예에서, 보안부는 하기 수학식 1을 이용하여 멀티미디어 파일별로 보안요구점수를 산출한다.In one embodiment, the security department calculates a security requirement score for each multimedia file using Equation 1 below.

[수학식 1][Equation 1]

여기서, S는 보안요구점수, w_1은 멀티미디어 파일의 종류별로 차등하게 설정된 제1 가중치, w_2는 멀티미디어 파일의 크기에 비례하여 설정되는 제2 가중치, v_s는 멀티미디어 파일의 카테고리별로 설정되는 기준값, v_k는 보안요구점수 산출이 요구되는 멀티미디어 파일에 설정된 키워드의 임베딩 벡터의 크기값, v_a는 보안요구점수 산출이 요구되는 멀티미디어 파일과 동일한 카테고리에 포함된 다른 멀티미디어 파일 각각에 설정된 키워드의 임베딩 벡터들의 평균 크기값, r은 보안요구점수 산출이 요구되는 멀티미디어 파일과 동일한 카테고리에 포함된 파일의 개수, tr은 멀티미디어 파일을 전송한 수신자 단말별로 설정되는 가중치이다.Here, S is the security requirement score, w_1 is the first weight set differentially for each type of multimedia file, w_2 is the second weight set in proportion to the size of the multimedia file, v_s is the reference value set for each category of multimedia file, and v_k is The size value of the keyword embedding vector set in the multimedia file requiring security requirement calculation, v_a is the average size value of the keyword embedding vectors set in each of the other multimedia files included in the same category as the multimedia file requiring security requirement calculation. , r is the number of files included in the same category as the multimedia file for which security score calculation is required, and tr is the weight set for each recipient terminal that transmitted the multimedia file.

단어를 밀집 벡터(dense vector)의 형태로 표현하는 방법을 워드 임베딩(word embedding)이라고 하며, 이 밀집 벡터를 워드 임베딩 과정을 통해 나온 결과라고 하여 임베딩 벡터(embedding vector)라고도 한다.The method of expressing words in the form of dense vectors is called word embedding, and this dense vector is also called an embedding vector because it is the result of the word embedding process.

워드 임베딩 방법론으로는 LSA, Word2Vec, FastText, Glove 등이 있다. 케라스에서 제공하는 도구인 Embedding()는 앞서 언급한 방법들을 사용하지는 않지만, 단어를 랜덤한 값을 가지는 밀집 벡터로 변환한 뒤에, 인공 신경망의 가중치를 학습하는 것과 같은 방식으로 단어 벡터를 학습하는 방법을 사용한다. 따라서 임베딩 벡터는 실수값으로 표현된다.Word embedding methodologies include LSA, Word2Vec, FastText, and Glove. Embedding(), a tool provided by Keras, does not use the methods mentioned above, but converts words into dense vectors with random values and then learns the word vectors in the same way as learning the weights of an artificial neural network. Use the method. Therefore, the embedding vector is expressed as a real value.

이와 같이 보안부는 멀티미디어 파일별로 보안요구점수를 산출하여, 산출된 보안요구점수가 미리 설정된 기준값 이상인 멀티미디어 파일은 암호화하여 관리하고, 기준값보다 낮은 보안요구점수가 산출된 멀티미디어 파일은 별도의 암호화 과정없이 관리한다. 이와 같이, 설정된 보안정책에 따라 멀티미디어 파일의 보안작업 선택적으로 수행함으로써 보안작업에 요구되는 리소스 및 시간을 단축시킬 수 있다.In this way, the security department calculates the security requirement score for each multimedia file, encrypts and manages multimedia files whose calculated security requirement scores are higher than a preset standard value, and manages multimedia files with a security requirement score lower than the standard value without a separate encryption process. do. In this way, the resources and time required for security work can be reduced by selectively performing security work on multimedia files according to the set security policy.

실행부는 선택부에 의해 선택된 보안작업을 해당 멀티미디어 파일에 대하여 수행하도록 상기 보안수단을 제어한다.The execution unit controls the security means to perform the security task selected by the selection unit on the corresponding multimedia file.

발신부는 보안부에 의해 보안작업이 실행된 해당 멀티미디어 파일을 상기한 다양한 수신처 또는 발신처에서 오는 경로에 구비된 보안수단으로 발신한다.The transmitter transmits the multimedia file for which security work has been performed by the security department to the various recipients or security means provided on the path from the transmitter.

한편, 상술한 수학식 1에서, 멀티미디어 파일을 전송한 수신자 단말별로 설정되는 가중치 tr은 다음과 같이 산출될 수 있다.Meanwhile, in Equation 1 above, the weight tr set for each recipient terminal that transmitted the multimedia file can be calculated as follows.

[수학식 2][Equation 2]

여기서, tr은 멀티미디어 파일을 전송한 수신자 단말별로 설정되는 가중치, p는 수신자 단말이 멀티미디어 파일을 전송한 총 횟수, m은 수신자 단말이 전송한 멀티미디어 파일들 중 보안부에 의해 암호화된 멀티미디어 파일의 수, q는 수신자 단말이 전송한 멀티미디어 파일들에 대하여 다른 단말들로부터 수신되는 평균 평점, b는 사용자의 신용등급에 따라 설정되는 가중치이다.Here, tr is the weight set for each recipient terminal that transmitted the multimedia file, p is the total number of times the recipient terminal transmitted the multimedia file, m is the number of multimedia files encrypted by the security department among the multimedia files transmitted by the recipient terminal, q is the average rating received from other terminals for multimedia files transmitted by the recipient terminal, and b is a weight set according to the user's credit rating.

예컨대, p가 10, m이 5, q가 4, b가 7인 경우, tr은 약 3.3의 값으로 산출되며, 보안부는 상술한 수학식 2를 이용하여 사용자 단말별로 가중치를 차등하게 설정함으로써, 멀티미디어 파일의 보안 신뢰성이 향상될 수 있다.For example, when p is 10, m is 5, q is 4, and b is 7, tr is calculated to be a value of about 3.3, and the security department sets the weights differently for each user terminal using the above-mentioned equation 2, The security reliability of multimedia files can be improved.

몇몇 또 다른 실시예에서, 본 발명에 따른 시스템은 멀티미디어 파일별로 설정된 키워드를 분석하여 멀티미디어 파일을 종류별로 구분하여 관리하는 데이터 수집부를 더 포함할 수 있다.In some other embodiments, the system according to the present invention may further include a data collection unit that analyzes keywords set for each multimedia file and manages the multimedia files by classifying them by type.

데이터 수집부는 텍스트 형태의 분류 데이터를 인식하기 위해, Word2Vec 알고리즘으로 학습 데이터를 학습하여 입력 데이터에 대하여 문맥 정보를 추출하는 신경망을 구축할 수 있다.In order to recognize classification data in text form, the data collection unit can construct a neural network that learns learning data using the Word2Vec algorithm and extracts context information about the input data.

Word2Vec 알고리즘은 신경망 언어 모델(NNLM : Neural Network Language Model)을 포함할 수 있다. 신경망 언어 모델은 기본적으로 Input Layer, Projection Layer, Hidden Layer, Output Layer로 이루어진 Neural Network이다. 신경망 언어 모델은 단어를 벡터화하는 방법에 사용되는 것이다. 신경망 언어 모델은 공지된 기술이므로 보다 자세한 설명은 생략하기로 한다.The Word2Vec algorithm may include a neural network language model (NNLM). A neural network language model is basically a neural network consisting of an input layer, projection layer, hidden layer, and output layer. Neural network language models are used to vectorize words. Since the neural network language model is a known technology, a more detailed description will be omitted.

Word2vec 알고리즘은, 텍스트마이닝을 위한 것으로, 각 단어 간의 앞, 뒤 관계를 보고 근접도를 정하는 알고리즘이다. Word2vec 알고리즘은 비지도 학습 알고리즘이다. Word2vec 알고리즘은 이름이 나타내는 바와 같이 단어의 의미를 벡터형태로 표현하는 계량기법일 수 있다. Word2vec 알고리즘은 각 단어를 200차원 정도의 공간에서 백터로 표현할 수 있다. Word2vec 알고리즘을 이용하면, 각 단어마다 단어에 해당하는 벡터를 구할 수 있다.The Word2vec algorithm is for text mining and is an algorithm that determines proximity by looking at the front and back relationships between each word. The Word2vec algorithm is an unsupervised learning algorithm. As the name indicates, the Word2vec algorithm can be a quantitative technique that expresses the meaning of words in vector form. The Word2vec algorithm can express each word as a vector in a space of about 200 dimensions. Using the Word2vec algorithm, you can obtain the vector corresponding to the word for each word.

Word2vec 알고리즘은 종래의 다른 알고리즘에 비해 자연어 처리 분야에서 비약적인 정밀도 향상을 가능하게 할 수 있다. Word2vec은 입력한 말뭉치의 문장에 있는 단어와 인접 단어의 관계를 이용해 단어의 의미를 학습할 수 있다. Word2vec 알고리즘은 인공 신경망에 근거한 것으로, 같은 맥락을 지닌 단어는 가까운 의미를 지니고 있다는 전제에서 출발한다. Word2vec 알고리즘은 텍스트 문서를 통해 학습을 진행하며, 한 단어에 대해 근처(전후 5 내지 10 단어 정도)에 출현하는 다른 단어들을 관련 단어로서 인공 신경망에 학습시킨다. 연관된 의미의 단어들은 문서상에서 가까운 곳에 출현할 가능성이 높기 때문에 학습을 반복해 나가는 과정에서 두 단어는 점차 가까운 벡터를 지닐 수 있다.The Word2vec algorithm can enable dramatic improvements in precision in the field of natural language processing compared to other conventional algorithms. Word2vec can learn the meaning of words using the relationships between words and adjacent words in sentences in the input corpus. The Word2vec algorithm is based on an artificial neural network and starts from the premise that words with the same context have close meanings. The Word2vec algorithm learns through text documents, and for one word, other words that appear nearby (about 5 to 10 words before or after it) are taught to the artificial neural network as related words. Because words with related meanings are likely to appear close together in a document, two words may have increasingly closer vectors during repeated learning.

Word2vec 알고리즘의 학습 방법은 CBOW(Continuous Bag Of Words) 방식과 skip-gram 방식이 있다. CBOW 방식은 주변 단어가 만드는 맥락을 이용해 타겟 단어를 예측하는 것이다. skip-gram 방식은 한 단어를 기준으로 주변에 올 수 있는 단어를 예측하는 것이다. 대규모 데이터셋에서는 skip-gram 방식이 더 정확한 것으로 알려져 있다.The learning methods of the Word2vec algorithm include the CBOW (Continuous Bag Of Words) method and the skip-gram method. The CBOW method predicts the target word using the context created by surrounding words. The skip-gram method predicts words that may come nearby based on one word. The skip-gram method is known to be more accurate in large datasets.

따라서, 본 발명의 실시 예에서는 skip-gram 방식을 이용한 Word2vec 알고리즘을 사용한다. 예컨대, Word2vec 알고리즘을 통해 학습이 잘 완료되면, 고차원 공간에서 비슷한 단어는 근처에 위치할 수 있다. 상술한 바와 같은 Word2vec 알고리즘에 따르면 학습 문서 내 주위 단어의 분포가 가까운 단어일수록 산출되는 벡터값은 유사해질 수 있으며, 산출된 벡터값이 비슷한 단어는 유사한 것으로 간주할 수 있다. Word2vec 알고리즘은 공지된 기술이므로 벡터값 계산과 관련한 보다 상세한 설명은 생략하기로 한다.Therefore, in the embodiment of the present invention, the Word2vec algorithm using the skip-gram method is used. For example, if learning is successfully completed through the Word2vec algorithm, similar words can be located nearby in a high-dimensional space. According to the Word2vec algorithm as described above, the closer the distribution of surrounding words in a learning document is, the more similar the calculated vector values can be, and words with similar calculated vector values can be considered similar. Since the Word2vec algorithm is a known technology, detailed descriptions related to vector value calculation will be omitted.

데이터 수집부(210)는 신경망에 작업자 단말(100)로부터 수신된 이미지 데이터로부터 추출된 텍스트를 입력하여 문맥 정보를 나타내는 평가 결과 벡터 값을 추출할 수 있다.The data collection unit 210 may input text extracted from image data received from the worker terminal 100 into a neural network to extract an evaluation result vector value representing context information.

데이터 수집부(210)는 평가 결과 벡터 값과 복수의 기준 벡터 값 각각의 유사도를 산출하고, 복수의 기준벡터 값 중 평가 결과 벡터 값과의 유사도가 가장 높은 기준 벡터 값을 추출할 수 있다. 이때, 유사도 산출 방법에는 유클리디안 거리(Euclidean distance), 코사인 유사도(Cosine similarity), 타니모토 계수(Tanimoto coeffieient) 등이 채택될 수 있다.The data collection unit 210 may calculate the similarity between the evaluation result vector value and each of the plurality of reference vector values, and extract the reference vector value with the highest similarity to the evaluation result vector value among the plurality of reference vector values. At this time, Euclidean distance, cosine similarity, Tanimoto coefficient, etc. may be adopted as similarity calculation methods.

데이터 수집부(210)는 평가 결과 벡터 값과의 유사도가 가장 높은 기준 벡터 값에 해당하는 단어를 인식된 텍스트에 대응되는 단어로 추출할 수 있다.The data collection unit 210 may extract the word corresponding to the reference vector value with the highest similarity to the evaluation result vector value as the word corresponding to the recognized text.

또한, 데이터 수집부(210)는 인공 신경망을 학습시킬 수 있고, 학습이 완료된 인공 신경망을 이용할 수도 있다. 프로세서는 메모리에 저장된 인공 신경망을 학습시키거나 실행할 수 있고, 메모리는 학습 완료된 인공 신경망을 저장할 수 있다. 인공 신경망을 학습시키는 전자 장치와 이용하는 전자 장치는 동일할 수 있지만 별개일 수도 있다. 인공지능은 인간의 뇌의 기능을 일부 구현한 컴퓨터 시스템으로, 스스로 학습하고 추측하고 판단할 수 있다. 학습을 진행할수록 답을 추출할 확률이 높아질 수 있다. 인공지능은 학습과 그것을 이용한 요소 기술들로 구성될 수 있다. 인공지능의 학습은 입력 데이터를 바탕으로 특징을 분류와 학습을 하는 알고리즘 기술이고, 요소 기술들은 학습 알고리즘을 이용하여 인간의 뇌의 기능을 일부 구현하는 기술일 수 있다. Additionally, the data collection unit 210 can train an artificial neural network, and can also use a trained artificial neural network. The processor can train or execute an artificial neural network stored in memory, and the memory can store a trained artificial neural network. The electronic device that trains the artificial neural network and the electronic device that uses it may be the same, but may also be separate. Artificial intelligence is a computer system that implements some of the functions of the human brain and can learn, guess, and make decisions on its own. As learning progresses, the probability of extracting an answer may increase. Artificial intelligence can be composed of learning and elemental technologies using it. Artificial intelligence learning is an algorithmic technology that classifies and learns features based on input data, and elemental technologies may be technologies that implement some of the functions of the human brain using learning algorithms.

인공지능은 확률적으로 여러 답이 나올 수 있는 문제에 접근하기 쉬운 기술로써 어떠한 입력 데이터에 따르는 최적의 주기와 방법, 계획 등을 논리적이고 확률적으로 추론할 수 있다. 인공지능의 추론 기술은 입력 데이터를 판단하고 최적화 예측, 지식과 확률 기반 추론, 선호 기반 계획 등을 포함할 수 있다.Artificial intelligence is a technology that makes it easy to approach problems that can have multiple answers probabilistically, and can logically and probabilistically infer the optimal cycle, method, and plan according to any input data. Artificial intelligence's inference technology can include judging input data, optimization predictions, knowledge and probability-based reasoning, and preference-based planning.

인공신경망은 기계학습 분야의 학습 알고리즘 중 하나로 뇌의 뉴런과 시냅스의 연결을 프로그램으로 구현한 것이다. 인공신경망은 프로그램으로 신경망의 구조를 만든 다음 학습시켜 원하는 기능을 가지게 할 수 있다. 오차가 존재할 수 있지만 거대한 데이터를 바탕으로 학습하여 입력 데이터를 가지고 적절한 출력 데이터를 출력할 수 있다. 통계적으로 좋은 결과를 얻었던 출력 데이터를 획득할 수 있고 인간의 추론과 유사하다는 장점이 있다.Artificial neural network is one of the learning algorithms in the machine learning field and is a program that implements the connection between neurons and synapses in the brain. Artificial neural networks can be created through a program to create a neural network structure and then learn it to have the desired function. Although there may be errors, it is possible to learn based on huge data and output appropriate output data with input data. It has the advantage of being able to obtain output data with statistically good results and being similar to human reasoning.

데이터 수집부(210)는 빅데이터를 기반으로 구축된 인공지능 알고리즘을 이용하여 학습에 요구되는 질의/메트릭 데이터셋을 구축할 수 있고, 이를 위해 미리 학습된 다수의 인공 신경망을 포함할 수 있다.The data collection unit 210 can build a query/metric dataset required for learning using an artificial intelligence algorithm built on big data, and may include a number of artificial neural networks that have been trained in advance for this purpose.

본 실시예에서 중계서버는 기계 학습 알고리즘의 수행을 위하여 미리 학습된 다수의 인공 신경망을 포함할 수 있다. 기계 학습으로 입력 데이터를 바탕으로 출력 데이터를 출력하고 이것의 결과를 이용해 스스로 학습할 수도 있고 이로 인해 스스로 데이터 처리 능력이 향상될 수 있다. 인공 신경망은 입력 데이터를 바탕으로 특징을 추출하고 규칙성을 추측하여 결과데이터를 출력할 수 있고 이러한 과정이 쌓일수록 결과 데이터의 신뢰도가 높아지게 된다. In this embodiment, the relay server may include a number of pre-trained artificial neural networks to perform machine learning algorithms. With machine learning, you can output output data based on input data and use the results to learn on your own, which can improve your own data processing ability. Artificial neural networks can extract features based on input data, infer regularities, and output result data. As this process accumulates, the reliability of the result data increases.

본 실시예에서 인공 신경망은 텍스트로 인식된 객체의 형태, 길이, 개수, 고저차 중 적어도 하나 이상의 특징 데이터로부터 텍스트 데이터를 출력하는 알고리즘일 수 있다. 인공 신경망은 빅데이터를 그대로 입력 데이터로 하거나 불필요한 데이터를 정리하는 처리 과정을 거친 후 입력 데이터로 하여 최상의 출력 데이터를 추론할 수 있다. In this embodiment, the artificial neural network may be an algorithm that outputs text data from at least one of the feature data of the shape, length, number, and elevation difference of an object recognized as text. Artificial neural networks can infer the best output data using big data as input data or after going through a processing process to organize unnecessary data.

인공지능 기계 학습 모델은 학습 종류에 따라 Super Viser Learning(지도학습), UnSuper Viser Learning(비지도 학습), Semisupervised learning(반지도 학습), Reinforcement Learning(강화 학습) 등이 있다. 그리고 기계 학습 알고리즘으로 Decision Tree(의사 결정 트리), K-Nearest Neightbor, Artificial Neural Network(인공 신경망), Support Vector Machine, Ensemble Learning, Gradient Descent(기울기 하강법), Na

ve Bayes Classifier, Hidden Markov Model, K-Means Clustering 등이 사용될 수 있다.Depending on the type of learning, artificial intelligence machine learning models include Super Viser Learning, UnSuper Viser Learning, Semisupervised Learning, and Reinforcement Learning. And machine learning algorithms include Decision Tree, K-Nearest Nearest Bor, Artificial Neural Network, Support Vector Machine, Ensemble Learning, Gradient Descent, Na

ve Bayes Classifier, Hidden Markov Model, K-Means Clustering, etc. can be used.

인공 신경망은 입력 데이터에 포함될 수 있는 여러 입력 값들에 대해 미리 학습된 상태일 수 있다. 인공 신경망은 학습방식 중 하나인 reinforcement learning(강화 학습)에 따라 학습되는 인공 신경망일 수 있다. 강화 학습은 보수와 제한을 설정하여 점차 올바른 결과를 획득할 확률을 올려가는 방식이다. 인공 신경망은 Convelutional Neural Network(CNN, 합성곱 신경망)이나 Recurrent Neural Network(RNN, 순환 신경망)에 기반하여 모델링될 수도 있다.The artificial neural network may be pre-trained on various input values that may be included in the input data. An artificial neural network may be an artificial neural network that is learned according to reinforcement learning, one of the learning methods. Reinforcement learning is a method of gradually increasing the probability of obtaining the correct result by setting rewards and limits. Artificial neural networks can also be modeled based on Convelutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs).

이와 같이, 데이터 수집부는 빅데이터 및 인공 신경망을 이용하여 시설물 점검 과정에서 이미지 형태로 수집된 점검 결과지로부터 측정 데이터를 추출할 수 있다.In this way, the data collection unit can extract measurement data from the inspection results collected in the form of images during the facility inspection process using big data and artificial neural networks.

이와 같은, 본 발명에 따른 기술은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.As such, the technology according to the present invention may be implemented as an application or in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded on the computer-readable recording medium may be those specifically designed and configured for the present invention, or may be known and usable by those skilled in the computer software field.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the invention and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구범위에 기재된 본 발명의 사상 및 공간으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to embodiments, those skilled in the art will understand that various modifications and changes can be made to the present invention without departing from the spirit and space of the present invention as set forth in the following patent claims. You will be able to.

110: 수신부
120: 선택부
130: 보안부
140: 실행부
150: 발신부110: receiving unit
120: selection part
130: Security Department
140: Execution department
150: Transmitter unit

Claims

In the content security system for multimedia file security, which receives multimedia files from a recipient and a sender, performs security operations by applying a set security policy to the received multimedia files, and then transmits them to the recipient,
The content security system for multimedia file security,
A receiving unit that receives multimedia files and incoming/outgoing information from security means provided on paths coming from various receiving or transmitting means;
a selection unit that selects a security task to be applied to the multimedia file according to a security policy set for the multimedia file received by the receiving unit;
at least one security unit that performs a security operation selected by the selection unit on the corresponding multimedia file;
an execution unit that executes the security means to perform the security task selected by the selection unit on the corresponding multimedia file; and
It includes a sending unit that transmits the multimedia file for which a security operation has been performed by the security unit to a security means provided on a path from the various receiving or sending sources.
The security department said,
A content security system for multimedia file security that calculates the security requirement score for each multimedia file using the following equation, and selectively encrypts only multimedia files whose calculated security requirement score is higher than a preset standard value.

[Equation]

Here, S is the security requirement score, w_1 is the first weight set differentially for each type of multimedia file, w_2 is the second weight set in proportion to the size of the multimedia file, v_s is the reference value set for each category of multimedia file, and v_k is The size value of the keyword embedding vector set in the multimedia file requiring security requirement calculation, v_a is the average size value of the keyword embedding vectors set in each of the other multimedia files included in the same category as the multimedia file requiring security requirement calculation. , r is the number of files included in the same category as the multimedia file for which security score calculation is required, and tr is the weight set for each recipient terminal that transmitted the multimedia file.

delete