KR102598072B1

KR102598072B1 - Apparatus and method for providing service for preventing personal information exposure based on artificial intelligence algorithm

Info

Publication number: KR102598072B1
Application number: KR1020230036538A
Authority: KR
Inventors: 김재문; 노성운; 김상용
Original assignee: (주)노웨어소프트
Priority date: 2023-03-21
Filing date: 2023-03-21
Publication date: 2023-11-06

Abstract

AI 알고리즘에 기초하여 개인 정보 노출을 방지하는 서비스를 제공하는 방법 및 장치가 개시된다. 본 개시의 일 실시예에 따른, 장치에 의해 수행되는, AI 알고리즘에 기초하여 개인 정보 노출을 방지하는 서비스를 제공하는 방법은, 사용자가 이용하는 단말 장치로부터 제1 컨텐츠가 전송됨에 기반하여, 상기 제1 컨텐츠에 포함된 제1 텍스트 데이터 또는 제1 음성 데이터 중의 적어도 하나를 추출하는 단계; 상기 제1 텍스트 데이터 및 상기 제1 음성 데이터에 대응되는 제2 텍스트 데이터 중의 적어도 하나를 미리 학습된 제1 AI 모델에 입력하여 상기 제1 텍스트 데이터 또는 상기 제2 텍스트 데이터에 포함된 하나 이상의 개인 정보와 관련된 정보를 획득하는 단계; 상기 제1 컨텐츠에 포함된 상기 제1 텍스트 데이터 또는 상기 제1 음성 데이터 중의 적어도 하나에서 상기 하나 이상의 개인 정보를 비식별화한 후 상기 제1 컨텐츠를 업로드하는 단계; 상기 사용자가 이용하는 단말 장치로 상기 제1 컨텐츠 상에서 비식별화한 상기 하나 이상의 개인 정보를 알리는 제1 메시지를 전송하는 단계; 및 상기 사용자가 이용하는 단말 장치로부터 제1 유형의 개인 정보의 비식별화 해제를 요청하는 제2 메시지를 수신됨에 기반하여, 상기 제1 컨텐츠 상의 상기 하나 이상의 개인 정보 중 상기 제1 유형의 개인 정보에 대해 비식별화 해제를 수행하는 단계를 포함할 수 있다.A method and device for providing a service that prevents personal information exposure based on an AI algorithm are disclosed. According to an embodiment of the present disclosure, a method of providing a service for preventing personal information exposure based on an AI algorithm performed by a device is based on transmission of first content from a terminal device used by a user, 1 extracting at least one of first text data or first voice data included in content; At least one of the first text data and the second text data corresponding to the first voice data is input into a pre-trained first AI model, and one or more personal information included in the first text data or the second text data Obtaining information related to; De-identifying the one or more personal information in at least one of the first text data or the first voice data included in the first content and then uploading the first content; transmitting a first message notifying the one or more pieces of personal information de-identified in the first content to a terminal device used by the user; And based on receiving a second message requesting de-identification of the first type of personal information from the terminal device used by the user, the first type of personal information among the one or more personal information on the first content It may include the step of performing de-identification.

Description

Device and method for providing services that prevent personal information exposure based on artificial intelligence algorithms {APPARATUS AND METHOD FOR PROVIDING SERVICE FOR PREVENTING PERSONAL INFORMATION EXPOSURE BASED ON ARTIFICIAL INTELLIGENCE ALGORITHM}

본 개시는 개인 정보 노출 방지 분야에 관한 것으로서, 더욱 상세하게는 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 서비스를 제공하는 장치 및 방법에 관한 것이다.This disclosure relates to the field of preventing personal information exposure, and more specifically, to a device and method for providing a service for preventing personal information exposure based on an artificial intelligence algorithm.

개인 정보는 개인을 식별할 수 있게 하는 개인의 고유 정보를 의미한다. 예로, 개인 정보는 주민번호, 핸드폰 번호, 신용 카드 번호, IP, ID, 비밀번호, 여권번호, 운전면허 번호 등으로 구분될 수 있다. 개인 정보는 개인을 식별할 수 있다는 점에서 중요할 뿐만 아니라 기업 활동을 위해서도 반드시 필요한 귀중한 정보이다.Personal information refers to unique information about an individual that allows the individual to be identified. For example, personal information can be divided into resident registration number, mobile phone number, credit card number, IP, ID, password, passport number, driver's license number, etc. Personal information is not only important in that it can identify individuals, but is also valuable information that is essential for corporate activities.

최근 들어 개인 정보 유출의 심각성이 증대되고 있으며 노출된 개인 정보를 악용하는 사례도 급증하고 있다. 개인 정보를 활용한 서비스 및 관련 산업의 규모가 기하급수적으로 성장함에 따라 개인 정보 유출 방지의 필요성은 점차 대두되고 있다.Recently, the severity of personal information leaks has been increasing, and cases of misuse of exposed personal information are also rapidly increasing. As the scale of services and related industries utilizing personal information grows exponentially, the need to prevent personal information leaks is gradually emerging.

등록특허공보 제10-1021305호, 2011.03.03Registered Patent Publication No. 10-1021305, 2011.03.03

본 개시는 상술된 문제점을 해결하기 위해 안출된 것으로서, 본 개시의 목적은 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 서비스를 제공하는 장치 및 방법을 제공함에 있다.This disclosure was created to solve the above-mentioned problems, and the purpose of this disclosure is to provide a device and method for providing a service that prevents personal information exposure based on an artificial intelligence algorithm.

본 개시가 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present disclosure are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those skilled in the art from the description below.

본 개시의 일 실시예에 따른, 장치에 의해 수행되는, 인공지능(artificial intelligence, AI) 알고리즘에 기초하여 개인 정보 노출을 방지하는 서비스를 제공하는 방법은, 사용자가 이용하는 단말 장치로부터 제1 컨텐츠가 전송됨에 기반하여, 상기 제1 컨텐츠에 포함된 제1 텍스트 데이터 또는 제1 음성 데이터 중의 적어도 하나를 추출하는 단계; 상기 제1 텍스트 데이터 및 상기 제1 음성 데이터에 대응되는 제2 텍스트 데이터 중의 적어도 하나를 미리 학습된 제1 AI 모델에 입력하여 상기 제1 텍스트 데이터 또는 상기 제2 텍스트 데이터에 포함된 하나 이상의 개인 정보와 관련된 정보를 획득하는 단계; 상기 제1 컨텐츠에 포함된 상기 제1 텍스트 데이터 또는 상기 제1 음성 데이터 중의 적어도 하나에서 상기 하나 이상의 개인 정보를 비식별화한 후 상기 제1 컨텐츠를 업로드하는 단계; 상기 사용자가 이용하는 단말 장치로 상기 제1 컨텐츠 상에서 비식별화한 상기 하나 이상의 개인 정보를 알리는 제1 메시지를 전송하는 단계; 및 상기 사용자가 이용하는 단말 장치로부터 제1 유형의 개인 정보의 비식별화 해제를 요청하는 제2 메시지를 수신됨에 기반하여, 상기 제1 컨텐츠 상의 상기 하나 이상의 개인 정보 중 상기 제1 유형의 개인 정보에 대해 비식별화 해제를 수행하는 단계를 포함할 수 있다.According to an embodiment of the present disclosure, a method of providing a service for preventing personal information exposure based on an artificial intelligence (AI) algorithm performed by a device includes first content being transmitted from a terminal device used by a user. extracting at least one of first text data or first voice data included in the first content based on the transmission; At least one of the first text data and the second text data corresponding to the first voice data is input into a pre-trained first AI model, and one or more personal information included in the first text data or the second text data Obtaining information related to; De-identifying the one or more personal information in at least one of the first text data or the first voice data included in the first content and then uploading the first content; transmitting a first message notifying the one or more pieces of personal information de-identified in the first content to a terminal device used by the user; And based on receiving a second message requesting de-identification of the first type of personal information from the terminal device used by the user, the first type of personal information among the one or more personal information on the first content It may include the step of performing de-identification.

그리고, 상기 방법은, 상기 제1 컨텐츠로부터 추출된 상기 제1 음성 데이터에 대해 STT(speech-to-text) 알고리즘을 적용하여 상기 제2 텍스트 데이터를 획득하는 단계를 더 포함할 수 있다.Additionally, the method may further include obtaining the second text data by applying a speech-to-text (STT) algorithm to the first voice data extracted from the first content.

그리고, 상기 제1 컨텐츠를 구성하는 복수의 이미지 프레임이 미리 학습된 제2 AI 모델에 입력됨에 기반하여, 상기 제2 AI 모델은: 상기 복수의 이미지 프레임 각각에 표시된 복수의 텍스트를 식별하고, 상기 복수의 텍스트 중 중복되는 텍스트를 제외한 나머지 텍스트를 상기 제1 텍스트 데이터로서 출력할 수 있다.And, based on the plurality of image frames constituting the first content being input to a pre-trained second AI model, the second AI model: identifies a plurality of texts displayed in each of the plurality of image frames, Among the plurality of texts, the remaining texts excluding overlapping texts may be output as the first text data.

그리고, 상기 제2 AI 모델은, 하나 이상의 컨볼루션 레이어(convolutional layer), 하나 이상의 LSTM(long short term memory) 레이어, 및 소프트 맥스(softmax) 레이어 중의 적어도 하나를 포함할 수 있다.And, the second AI model may include at least one of one or more convolutional layers, one or more long short term memory (LSTM) layers, and a softmax layer.

그리고, 상기 방법은, 상기 사용자가 이용하는 단말 장치로부터 제2 컨텐츠가 수신됨에 기반하여, 상기 제2 컨텐츠에 포함된 제3 텍스트 데이터 및 제2 음성 데이터를 추출하는 단계; 상기 제3 텍스트 데이터 및 상기 제2 음성 데이터에 대응되는 제4 텍스트 데이터 중의 적어도 하나를 상기 제1 AI 모델에 입력하여 상기 제3 텍스트 데이터 또는 상기 제4 텍스트 데이터에 포함된 하나 이상의 개인 정보와 관련된 정보를 획득하는 단계; 및 상기 제2 컨텐츠에 포함된 상기 제3 텍스트 데이터 또는 상기 제2 음성 데이터 중의 적어도 하나에서 상기 제1 유형의 개인 정보를 제외한 나머지 개인 정보를 비식별화한 후 상기 제2 컨텐츠를 업로드하는 단계를 더 포함할 수 있다.Additionally, the method includes extracting third text data and second voice data included in the second content based on the second content being received from the terminal device used by the user; At least one of the third text data and the fourth text data corresponding to the second voice data is input into the first AI model to relate to one or more personal information included in the third text data or the fourth text data. Obtaining information; And uploading the second content after de-identifying the remaining personal information excluding the first type of personal information from at least one of the third text data or the second voice data included in the second content. More may be included.

그리고, 상기 방법은, 상기 사용자가 이용하는 단말 장치로부터 상기 제1 텍스트 데이터 또는 상기 제1 음성 데이터 중의 적어도 하나에 포함된 특정 단어가 제2 유형의 개인 정보라는 제3 메시지를 수신함에 기반하여, 상기 제1 컨텐츠 상에서 상기 특정 단어를 비식별화하는 단계; 및 상기 제3 메시지에 기초하여 상기 제1 AI 모델을 추가 학습하는 단계를 더 포함할 수 있다.And, the method is based on receiving a third message that a specific word included in at least one of the first text data or the first voice data is a second type of personal information from the terminal device used by the user, De-identifying the specific word on first content; And it may further include the step of additionally learning the first AI model based on the third message.

본 개시의 또 다른 실시예로, 인공지능(artificial intelligence, AI) 알고리즘에 기초하여 개인 정보 노출을 방지하는 서비스를 제공하는 장치는, 적어도 하나의 메모리; 및 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서는, 사용자가 이용하는 단말 장치로부터 제1 컨텐츠가 전송됨에 기반하여, 상기 제1 컨텐츠에 포함된 제1 텍스트 데이터 또는 제1 음성 데이터 중의 적어도 하나를 추출하고; 상기 제1 텍스트 데이터 및 상기 제1 음성 데이터에 대응되는 제2 텍스트 데이터 중의 적어도 하나를 미리 학습된 제1 AI 모델에 입력하여 상기 제1 텍스트 데이터 또는 상기 제2 텍스트 데이터에 포함된 하나 이상의 개인 정보와 관련된 정보를 획득하고; 상기 제1 컨텐츠에 포함된 상기 제1 텍스트 데이터 또는 상기 제1 음성 데이터 중의 적어도 하나에서 상기 하나 이상의 개인 정보를 비식별화한 후 상기 제1 컨텐츠를 업로드하고; 상기 사용자가 이용하는 단말 장치로 상기 제1 컨텐츠 상에서 비식별화한 상기 하나 이상의 개인 정보를 알리는 제1 메시지를 전송하고; 및 상기 사용자가 이용하는 단말 장치로부터 제1 유형의 개인 정보의 비식별화 해제를 요청하는 제2 메시지를 수신됨에 기반하여, 상기 제1 컨텐츠 상의 상기 하나 이상의 개인 정보 중 상기 제1 유형의 개인 정보에 대해 비식별화 해제를 수행하는 개인 정보 필터링 모듈을 포함할 수 있다.In another embodiment of the present disclosure, an apparatus that provides a service for preventing personal information exposure based on an artificial intelligence (AI) algorithm includes at least one memory; and at least one processor, wherein the at least one processor selects at least one of first text data or first voice data included in the first content based on the first content being transmitted from the terminal device used by the user. extract; At least one of the first text data and the second text data corresponding to the first voice data is input into a pre-trained first AI model, and one or more personal information included in the first text data or the second text data Obtain information related to; De-identifying the one or more personal information in at least one of the first text data or the first voice data included in the first content and then uploading the first content; transmitting a first message informing the terminal device used by the user of the one or more pieces of de-identified personal information in the first content; And based on receiving a second message requesting de-identification of the first type of personal information from the terminal device used by the user, the first type of personal information among the one or more personal information on the first content It may include a personal information filtering module that performs de-identification.

그리고, 개인 정보 필터링 모듈은, 상기 제1 컨텐츠로부터 추출된 상기 제1 음성 데이터에 대해 STT(speech-to-text) 알고리즘을 적용하여 상기 제2 텍스트 데이터를 획득할 수 있다.And, the personal information filtering module may obtain the second text data by applying a speech-to-text (STT) algorithm to the first voice data extracted from the first content.

그리고, 상기 개인 정보 필터링 모듈은, 상기 사용자가 이용하는 단말 장치로부터 제2 컨텐츠가 수신됨에 기반하여, 상기 제2 컨텐츠에 포함된 제3 텍스트 데이터 및 제2 음성 데이터를 추출하고; 상기 제3 텍스트 데이터 및 상기 제2 음성 데이터에 대응되는 제4 텍스트 데이터 중의 적어도 하나를 상기 제1 AI 모델에 입력하여 상기 제3 텍스트 데이터 또는 상기 제4 텍스트 데이터에 포함된 하나 이상의 개인 정보와 관련된 정보를 획득하고; 및 상기 제2 컨텐츠에 포함된 상기 제3 텍스트 데이터 또는 상기 제2 음성 데이터 중의 적어도 하나에서 상기 제1 유형의 개인 정보를 제외한 나머지 개인 정보를 비식별화한 후 상기 제2 컨텐츠를 업로드할 수 있다.And, the personal information filtering module extracts third text data and second voice data included in the second content based on the second content being received from the terminal device used by the user; At least one of the third text data and the fourth text data corresponding to the second voice data is input into the first AI model to relate to one or more personal information included in the third text data or the fourth text data. obtain information; And the second content can be uploaded after de-identifying the remaining personal information excluding the first type of personal information from at least one of the third text data or the second voice data included in the second content. .

그리고, 상기 개인 정보 필터링 모듈은, 상기 사용자가 이용하는 단말 장치로부터 상기 제1 텍스트 데이터 또는 상기 제1 음성 데이터 중의 적어도 하나에 포함된 특정 단어가 제2 유형의 개인 정보라는 제3 메시지를 수신함에 기반하여, 상기 제1 컨텐츠 상에서 상기 특정 단어를 비식별화하고, 상기 적어도 하나의 프로세서에 포함된 AI 모델 학습 모듈은, 상기 제3 메시지에 기초하여 상기 제1 AI 모델을 추가 학습시킬 수 있다.And, the personal information filtering module is based on receiving a third message that a specific word included in at least one of the first text data or the first voice data is a second type of personal information from the terminal device used by the user. Thus, the specific word may be de-identified on the first content, and the AI model learning module included in the at least one processor may further train the first AI model based on the third message.

이 외에도, 본 개시를 구현하기 위한 실행하기 위한 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램이 더 제공될 수 있다.In addition to this, a computer program stored in a computer-readable recording medium for execution to implement the present disclosure may be further provided.

이 외에도, 본 개시를 구현하기 위한 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition, a computer-readable recording medium recording a computer program for executing a method for implementing the present disclosure may be further provided.

본 개시의 다양한 실시예에 의해, 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 서비스를 제공하는 장치 및 방법이 제공될 수 있다.According to various embodiments of the present disclosure, an apparatus and method for providing a service for preventing personal information exposure based on an artificial intelligence algorithm can be provided.

본 개시의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by those skilled in the art from the description below.

도 1은 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 시스템을 간략히 도시한 도면이다.
도 2는 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 장치의 구성을 도시한 블록도이다.
도 3은 본 개시의 일 실시예에 따른, 제2 AI 모델의 구조를 예시하는 도면이다.
도 4는 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 방법을 설명하기 위한 순서도이다.1 is a diagram briefly illustrating a system for preventing personal information exposure based on an artificial intelligence algorithm according to an embodiment of the present disclosure.
Figure 2 is a block diagram showing the configuration of a device for preventing personal information exposure based on an artificial intelligence algorithm, according to an embodiment of the present disclosure.
Figure 3 is a diagram illustrating the structure of a second AI model, according to an embodiment of the present disclosure.
FIG. 4 is a flowchart illustrating a method of preventing personal information exposure based on an artificial intelligence algorithm according to an embodiment of the present disclosure.

본 개시 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다. 본 개시가 실시예들의 모든 요소들을 설명하는 것은 아니며, 본 개시가 속하는 기술분야에서 일반적인 내용 또는 실시예들 간에 중복되는 내용은 생략한다. 명세서에서 사용되는 '부, 모듈, 부재, 블록'이라는 용어는 소프트웨어 또는 하드웨어로 구현될 수 있으며, 실시예들에 따라 복수의 '부, 모듈, 부재, 블록'이 하나의 구성요소로 구현되거나, 하나의 '부, 모듈, 부재, 블록'이 복수의 구성요소들을 포함하는 것도 가능하다. Like reference numerals refer to like elements throughout this disclosure. The present disclosure does not describe all elements of the embodiments, and general content or overlapping content between the embodiments in the technical field to which the present disclosure pertains is omitted. The term 'unit, module, member, block' used in the specification may be implemented as software or hardware, and depending on the embodiment, a plurality of 'unit, module, member, block' may be implemented as a single component, or It is also possible for one 'part, module, member, or block' to include multiple components.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 간접적으로 연결되어 있는 경우를 포함하고, 간접적인 연결은 무선 통신망을 통해 연결되는 것을 포함한다.Throughout the specification, when a part is said to be “connected” to another part, this includes not only direct connection but also indirect connection, and indirect connection includes connection through a wireless communication network. do.

또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Additionally, when a part "includes" a certain component, this means that it may further include other components rather than excluding other components, unless specifically stated to the contrary.

명세서 전체에서, 어떤 부재가 다른 부재 "상에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout the specification, when a member is said to be located “on” another member, this includes not only cases where a member is in contact with another member, but also cases where another member exists between the two members.

제 1, 제 2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 전술된 용어들에 의해 제한되는 것은 아니다. Terms such as first and second are used to distinguish one component from another component, and the components are not limited by the above-mentioned terms.

단수의 표현은 문맥상 명백하게 예외가 있지 않는 한, 복수의 표현을 포함한다.Singular expressions include plural expressions unless the context clearly makes an exception.

각 단계들에 있어 식별부호는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 실시될 수 있다. The identification code for each step is used for convenience of explanation. The identification code does not explain the order of each step, and each step may be performed differently from the specified order unless a specific order is clearly stated in the context. there is.

이하 첨부된 도면들을 참고하여 본 개시의 작용 원리 및 실시예들에 대해 설명한다.Hereinafter, the operating principle and embodiments of the present disclosure will be described with reference to the attached drawings.

본 명세서에서 '본 개시에 따른 장치'는 연산처리를 수행하여 사용자에게 결과를 제공할 수 있는 다양한 장치들이 모두 포함된다. 예를 들어, 본 개시에 따른 장치는, 컴퓨터, 서버 장치 및 휴대용 단말기를 모두 포함하거나, 또는 어느 하나의 형태가 될 수 있다.In this specification, 'device according to the present disclosure' includes all various devices that can perform computational processing and provide results to the user. For example, the device according to the present disclosure may include all of a computer, a server device, and a portable terminal, or may take the form of any one.

여기에서, 상기 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop), 태블릿 PC, 슬레이트 PC 등을 포함할 수 있다.Here, the computer may include, for example, a laptop, desktop, laptop, tablet PC, slate PC, etc. equipped with a web browser.

본 개시에 따른 인공지능과 관련된 기능은 프로세서와 메모리를 통해 동작된다. 프로세서는 하나 또는 복수의 프로세서로 구성될 수 있다. 이때, 하나 또는 복수의 프로세서는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU, VPU(Vision Processing Unit)와 같은 그래픽 전용 프로세서 또는 NPU와 같은 인공지능 전용 프로세서일 수 있다. 하나 또는 복수의 프로세서는, 메모리에 저장된 기 정의된 동작 규칙 또는 인공지능 모델에 따라, 입력 데이터를 처리하도록 제어한다. 또는, 하나 또는 복수의 프로세서가 인공지능 전용 프로세서인 경우, 인공지능 전용 프로세서는, 특정 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계될 수 있다.Functions related to artificial intelligence according to the present disclosure are operated through a processor and memory. The processor may consist of one or multiple processors. At this time, one or more processors may be a general-purpose processor such as a CPU, AP, or DSP (Digital Signal Processor), a graphics-specific processor such as a GPU or VPU (Vision Processing Unit), or an artificial intelligence-specific processor such as an NPU. One or more processors control input data to be processed according to predefined operation rules or artificial intelligence models stored in memory. Alternatively, when one or more processors are dedicated artificial intelligence processors, the artificial intelligence dedicated processors may be designed with a hardware structure specialized for processing a specific artificial intelligence model.

기 정의된 동작 규칙 또는 인공지능 모델은 학습을 통해 만들어진 것을 특징으로 한다. 여기서, 학습을 통해 만들어진다는 것은, 기본 인공지능 모델이 학습 알고리즘에 의하여 다수의 학습 데이터들을 이용하여 학습됨으로써, 원하는 특성(또는, 목적)을 수행하도록 설정된 기 정의된 동작 규칙 또는 인공지능 모델이 만들어짐을 의미한다. 이러한 학습은 본 개시에 따른 인공지능이 수행되는 기기 자체에서 이루어질 수도 있고, 별도의 서버 및/ 또는 시스템을 통해 이루어 질 수도 있다. 학습 알고리즘의 예로는, 지도형 학습(supervised learning), 비지도 형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning) 또는 강화 학습(reinforcement learning)이 있으나, 전술한 예에 한정되지 않는다.Predefined operation rules or artificial intelligence models are characterized by being created through learning. Here, being created through learning means that the basic artificial intelligence model is learned using a large number of learning data by a learning algorithm, thereby creating a predefined operation rule or artificial intelligence model set to perform the desired characteristics (or purpose). It means burden. This learning may be performed on the device itself that performs the artificial intelligence according to the present disclosure, or may be performed through a separate server and/or system. Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but are not limited to the examples described above.

인공지능 모델은, 복수의 신경망 레이어들로 구성될 수 있다. 복수의 신경망 레이어들 각각은 복수의 가중치들 (weight values)을 갖고 있으며, 이전(previous) 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 신경 망 연산을 수행한다. 복수의 신경망 레이어들이 갖고 있는 복수의 가중치들은 인공지능 모델의 학습 결과에 의해 최적화될 수 있다. 예를 들어, 학습 과정 동안 인공지능 모델에서 획득한 로스(loss) 값 또는 코스트(cost) 값이 감소 또는 최소화되도록 복수의 가중치들이 갱신될 수 있다. 인공 신경망은 심층 신경망(DNN:Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 (Deep Q-Networks) 등이 있으나, 전술한 예에 한정되지 않는다.An artificial intelligence model may be composed of multiple neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and neural network calculation is performed through calculation between the calculation result of the previous layer and the plurality of weights. Multiple weights of multiple neural network layers can be optimized by the learning results of the artificial intelligence model. For example, a plurality of weights may be updated so that loss or cost values obtained from the artificial intelligence model are reduced or minimized during the learning process. Artificial neural networks may include deep neural networks (DNN), for example, Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN), or Deep Q-Networks, etc., but are not limited to the examples described above.

본 개시의 예시적인 실시예에 따르면, 프로세서는 인공지능을 구현할 수 있다. 인공지능이란 사람의 신경세포(biological neuron)를 모사하여 기계가 학습하도록 하는 인공신경망(Artificial Neural Network) 기반의 기계 학습법을 의미한다. 인공지능의 방법론에는 학습 방식에 따라 훈련데이터로서 입력데이터와 출력데이터가 같이 제공됨으로써 문제(입력데이터)의 해답(출력데이터)이 정해져 있는 지도학습(supervised learning), 및 출력데이터 없이 입력데이터만 제공되어 문제(입력데이터)의 해답(출력데이터)이 정해지지 않는 비지도학습(unsupervised learning), 및 현재의 상태(State)에서 어떤 행동(Action)을 취할 때마다 외부 환경에서 보상(Reward)이 주어지는데, 이러한 보상을 최대화하는 방향으로 학습을 진행하는 강화학습(reinforcement learning)으로 구분될 수 있다. 또한, 인공지능의 방법론은 학습 모델의 구조인 아키텍처에 따라 구분될 수도 있는데, 널리 이용되는 딥러닝 기술의 아키텍처는, 합성곱 신경망(CNN; Convolutional Neural Network), 순환신경망(RNN; Recurrent Neural Network), 트랜스포머(Transformer), 생성적 대립 신경망(GAN; generative adversarial networks) 등으로 구분될 수 있다.According to an exemplary embodiment of the present disclosure, a processor may implement artificial intelligence. Artificial intelligence refers to a machine learning method based on an artificial neural network that allows machines to learn by imitating human biological neurons. Methodology of artificial intelligence includes supervised learning, in which the answer (output data) to the problem (input data) is determined by providing input data and output data together as training data according to the learning method, and only input data is provided without output data. In unsupervised learning, in which the solution (output data) to the problem (input data) is not determined, and a reward is given from the external environment whenever an action is taken in the current state, , It can be divided into reinforcement learning, which conducts learning in the direction of maximizing these rewards. In addition, artificial intelligence methodologies can be divided according to the architecture, which is the structure of the learning model. The architecture of widely used deep learning technology is convolutional neural network (CNN) and recurrent neural network (RNN). , Transformer, generative adversarial networks (GAN), etc.

본 장치와 시스템은 인공지능 모델을 포함할 수 있다. 인공지능 모델은 하나의 인공지능 모델일 수 있고, 복수의 인공지능 모델로 구현될 수도 있다. 인공지능 모델은 뉴럴 네트워크(또는 인공 신경망)로 구성될 수 있으며, 기계학습과 인지과학에서 생물학의 신경을 모방한 통계학적 학습 알고리즘을 포함할 수 있다. 뉴럴 네트워크는 시냅스의 결합으로 네트워크를 형성한 인공 뉴런(노드)이 학습을 통해 시냅스의 결합 세기를 변화시켜, 문제 해결 능력을 가지는 모델 전반을 의미할 수 있다. 뉴럴 네트워크의 뉴런은 가중치 또는 바이어스의 조합을 포함할 수 있다. 뉴럴 네트워크는 하나 이상의 뉴런 또는 노드로 구성된 하나 이상의 레이어(layer)를 포함할 수 있다. 예시적으로, 장치는 input layer, hidden layer, output layer를 포함할 수 있다. 장치를 구성하는 뉴럴 네트워크는 뉴런의 가중치를 학습을 통해 변화시킴으로써 임의의 입력(input)으로부터 예측하고자 하는 결과(output)를 추론할 수 있다.The devices and systems may include artificial intelligence models. An artificial intelligence model may be a single artificial intelligence model or may be implemented as multiple artificial intelligence models. Artificial intelligence models may be composed of neural networks (or artificial neural networks) and may include statistical learning algorithms that mimic biological neurons in machine learning and cognitive science. A neural network can refer to an overall model in which artificial neurons (nodes), which form a network through the combination of synapses, change the strength of the synapse connection through learning and have problem-solving capabilities. Neurons in a neural network can contain combinations of weights or biases. A neural network may include one or more layers consisting of one or more neurons or nodes. By way of example, a device may include an input layer, a hidden layer, and an output layer. The neural network that makes up the device can infer the result (output) to be predicted from arbitrary input (input) by changing the weight of neurons through learning.

프로세서는 뉴럴 네트워크를 생성하거나, 뉴럴 네트워크를 훈련(train, 또는 학습(learn)하거나, 수신되는 입력 데이터를 기초로 연산을 수행하고, 수행 결과를 기초로 정보 신호(information signal)를 생성하거나, 뉴럴 네트워크를 재훈련(retrain)할 수 있다. 뉴럴 네트워크의 모델들은 GoogleNet, AlexNet, VGG Network 등과 같은 CNN(Convolution Neural Network), R-CNN(Region with Convolution Neural Network), RPN(Region Proposal Network), RNN(Recurrent Neural Network), S-DNN(Stacking-based deep Neural Network), S-SDNN(State-Space Dynamic Neural Network), Deconvolution Network, DBN(Deep Belief Network), RBM(Restrcted Boltzman Machine), Fully Convolutional Network, LSTM(Long Short-Term Memory) Network, Classification Network 등 다양한 종류의 모델들을 포함할 수 있으나 이에 제한되지는 않는다. 프로세서는 뉴럴 네트워크의 모델들에 따른 연산을 수행하기 위한 하나 이상의 프로세서를 포함할 수 있다. 예를 들어 뉴럴 네트워크는 심층 뉴럴 네트워크 (Deep Neural Network)를 포함할 수 있다.The processor creates a neural network, trains or learns a neural network, performs calculations based on received input data, generates an information signal based on the results, or generates a neural network. You can retrain the network. Neural network models include CNN (Convolution Neural Network), R-CNN (Region with Convolution Neural Network), RPN (Region Proposal Network), RNN such as GoogleNet, AlexNet, VGG Network, etc. (Recurrent Neural Network), S-DNN (Stacking-based deep Neural Network), S-SDNN (State-Space Dynamic Neural Network), Deconvolution Network, DBN (Deep Belief Network), RBM (Restrcted Boltzman Machine), Fully Convolutional Network , LSTM (Long Short-Term Memory) Network, Classification Network, etc., but are not limited to various types of models. The processor may include one or more processors to perform operations according to models of the neural network. For example, a neural network may include a deep neural network.

뉴럴 네트워크는 CNN(Convolutional Neural Network), RNN(Recurrent Neural Network), 퍼셉트론(perceptron), 다층 퍼셉트론(multilayer perceptron), FF(Feed Forward), RBF(Radial Basis Network), DFF(Deep Feed Forward), LSTM(Long Short Term Memory), GRU(Gated Recurrent Unit), AE(Auto Encoder), VAE(Variational Auto Encoder), DAE(Denoising Auto Encoder), SAE(Sparse Auto Encoder), MC(Markov Chain), HN(Hopfield Network), BM(Boltzmann Machine), RBM(Restricted Boltzmann Machine), DBN(Depp Belief Network), DCN(Deep Convolutional Network), DN(Deconvolutional Network), DCIGN(Deep Convolutional Inverse Graphics Network), GAN(Generative Adversarial Network), LSM(Liquid State Machine), ELM(Extreme Learning Machine), ESN(Echo State Network), DRN(Deep Residual Network), DNC(Differentiable Neural Computer), NTM(Neural Turning Machine), CN(Capsule Network), KN(Kohonen Network) 및 AN(Attention Network)를 포함할 수 있으나 이에 한정되는 것이 아닌 임의의 뉴럴 네트워크를 포함할 수 있음은 통상의 기술자가 이해할 것이다.Neural networks include CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), perceptron, multilayer perceptron, FF (Feed Forward), RBF (Radial Basis Network), DFF (Deep Feed Forward), and LSTM. (Long Short Term Memory), GRU (Gated Recurrent Unit), AE (Auto Encoder), VAE (Variational Auto Encoder), DAE (Denoising Auto Encoder), SAE (Sparse Auto Encoder), MC (Markov Chain), HN (Hopfield) Network), BM (Boltzmann Machine), RBM (Restricted Boltzmann Machine), DBN (Depp Belief Network), DCN (Deep Convolutional Network), DN (Deconvolutional Network), DCIGN (Deep Convolutional Inverse Graphics Network), GAN (Generative Adversarial Network) ), Liquid State Machine (LSM), Extreme Learning Machine (ELM), Echo State Network (ESN), Deep Residual Network (DRN), Differential Neural Computer (DNC), Neural Turning Machine (NTM), Capsule Network (CN), Those skilled in the art will understand that it may include any neural network, including, but not limited to, KN (Kohonen Network) and AN (Attention Network).

본 개시의 예시적인 실시예에 따르면, 프로세서는 GoogleNet, AlexNet, VGG Network 등과 같은 CNN(Convolution Neural Network), R-CNN(Region with Convolution Neural Network), RPN(Region Proposal Network), RNN(Recurrent Neural Network), S-DNN(Stacking-based deep Neural Network), S-SDNN(State-Space Dynamic Neural Network), Deconvolution Network, DBN(Deep Belief Network), RBM(Restrcted Boltzman Machine), Fully Convolutional Network, LSTM(Long Short-Term Memory) Network, Classification Network, Generative Modeling, eXplainable AI, Continual AI, Representation Learning, AI for Material Design, 자연어 처리를 위한 BERT, SP-BERT, MRC/QA, Text Analysis, Dialog System, GPT-3, GPT-4, 비전 처리를 위한 Visual Analytics, Visual Understanding, Video Synthesis, ResNet 데이터 지능을 위한 Anomaly Detection, Prediction, Time-Series Forecasting, Optimization, Recommendation, Data Creation 등 다양한 인공지능 구조 및 알고리즘을 이용할 수 있으며, 이에 제한되지 않는다. According to an exemplary embodiment of the present disclosure, the processor may support a Convolution Neural Network (CNN), a Region with Convolution Neural Network (R-CNN), a Region Proposal Network (RPN), a Recurrent Neural Network (RNN), such as GoogleNet, AlexNet, VGG Network, etc. ), S-DNN (Stacking-based deep Neural Network), S-SDNN (State-Space Dynamic Neural Network), Deconvolution Network, DBN (Deep Belief Network), RBM (Restrcted Boltzman Machine), Fully Convolutional Network, LSTM (Long Short-Term Memory) Network, Classification Network, Generative Modeling, eXplainable AI, Continual AI, Representation Learning, AI for Material Design, BERT for natural language processing, SP-BERT, MRC/QA, Text Analysis, Dialog System, GPT-3 , GPT-4, Visual Analytics for vision processing, Visual Understanding, Video Synthesis, and Anomaly Detection, Prediction, Time-Series Forecasting, Optimization, Recommendation, and Data Creation for ResNet data intelligence. , but is not limited to this.

이하, 첨부된 도면을 참조하여 본 개시의 실시예를 상세하게 설명한다.Hereinafter, embodiments of the present disclosure will be described in detail with reference to the attached drawings.

도 1은 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 시스템을 간략히 도시한 도면이다.1 is a diagram briefly illustrating a system for preventing personal information exposure based on an artificial intelligence algorithm according to an embodiment of the present disclosure.

도 1에 도시된 바와 같이, 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 시스템(1000)은, 장치(100), 복수의 사용자가 이용하는 단말 장치(200-1, 200-2, ... 200-N)(N은 2 이상의 자연수) 및 데이터 서버(300)를 포함할 수 있다.As shown in FIG. 1, the system 1000 for preventing personal information exposure based on an artificial intelligence algorithm includes a device 100, terminal devices 200-1, 200-2, ... used by a plurality of users. It may include 200-N) (N is a natural number of 2 or more) and a data server 300.

도 1에서 장치(100)는 데스크톱(desktop)으로 구현된 경우를 개시하고 있으며 복수의 사용자가 이용하는 단말 장치(200-1, 200-2, ... 200-N)는 스마트폰으로 구현된 경우를 개시하고 있으나, 이에 제한되는 것은 아니다. In FIG. 1, the device 100 is implemented as a desktop, and the terminal devices 200-1, 200-2, ... 200-N used by multiple users are implemented as smartphones. is disclosed, but is not limited thereto.

일 예로, 장치(100) 및 복수의 사용자가 이용하는 단말 장치(200-1, 200-2, ... 200-N)는 다양한 유형의 전자 장치(예로, 노트북, 데스크톱, 랩톱(laptop), 태블릿 PC, 슬레이트 PC 장치, 서버 장치 등)로 구현될 수 있으며, 하나 이상의 유형의 장치가 연결된 장치 군으로도 구현될 수 있다.As an example, the device 100 and the terminal devices 200-1, 200-2, ... 200-N used by a plurality of users are various types of electronic devices (e.g., laptops, desktops, laptops, tablets). It can be implemented as a PC, slate PC device, server device, etc.), and can also be implemented as a group of devices where one or more types of devices are connected.

시스템(1000)에 포함된 장치(100) 및 복수의 사용자가 이용하는 단말 장치(200-1, 200-2, ... 200-N)는 네트워크(W)를 통해 통신을 수행할 수 있다. The device 100 included in the system 1000 and the terminal devices 200-1, 200-2, ... 200-N used by a plurality of users can communicate through the network (W).

여기서, 네트워크(W)는 유선 네트워크와 무선 네트워크를 포함할 수 있다. 예를 들어, 네트워크는 근거리 네트워크(LAN: Local Area Network), 도시권 네트워크(MAN: Metropolitan Area Network), 광역 네트워크(WAN: Wide Area Network) 등의 다양한 네트워크를 포함할 수 있다.Here, the network W may include a wired network and a wireless network. For example, the network may include various networks such as a local area network (LAN), a metropolitan area network (MAN), and a wide area network (WAN).

또한, 네트워크(W)는 공지의 월드 와이드 웹(WWW: World Wide Web)을 포함할 수도 있다. 그러나, 본 개시의 실시예에 따른 네트워크(W)는 상기 열거된 네트워크에 국한되지 않고, 공지의 무선 데이터 네트워크나 공지의 전화 네트워크, 공지의 유무선 텔레비전 네트워크를 적어도 일부로 포함할 수도 있다.Additionally, the network W may include the known World Wide Web (WWW). However, the network (W) according to an embodiment of the present disclosure is not limited to the networks listed above, and may include at least some of a known wireless data network, a known telephone network, and a known wired and wireless television network.

장치(100)는 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 동작을 수행할 수 있다. 추가적으로, 장치(100)는 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 서비스를 수행할 수 있는 웹 기반 및/또는 어플리케이션 기반의 소프트웨어를 생성/실행할 수 있다. The device 100 may perform an operation to prevent personal information exposure based on an artificial intelligence algorithm. Additionally, the device 100 may create/execute web-based and/or application-based software that can perform a service that prevents personal information exposure based on an artificial intelligence algorithm.

복수의 사용자가 이용하는 단말 장치(200-1, 200-2, ... 200-N)는 장치(100)에 의해 생성된 웹 기반 및/또는 어플리케이션 기반의 소프트웨어를 실행할 수 있다. 따라서, 본 개시에서 장치(100)에 의해 실행되는 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 방법은 복수의 사용자가 이용하는 단말 장치(200-1, 200-2, ... 200-N) 역시 실행할 수 있다.Terminal devices 200-1, 200-2, ... 200-N used by a plurality of users may execute web-based and/or application-based software created by the device 100. Therefore, in the present disclosure, a method of preventing personal information exposure based on an artificial intelligence algorithm executed by the device 100 is used by terminal devices 200-1, 200-2, ... 200-N used by a plurality of users. You can also run it.

장치(100)는 사용자로부터 입력된 컨텐츠에 포함된 하나 이상의 개인 정보를 식별할 수 있다. 예로, 장치(100)는 컨텐츠를 구성하는 복수의 프레임 상의 텍스트 데이터 및/또는 음성 데이터에 대응되는 텍스트 데이터를 획득할 수 있다. 장치(100)는 획득된 텍스트 데이터 상에 개인 정보를 인공지능 알고리즘에 기초하여 식별하고, 식별된 개인 정보를 필터링/비식별화한 후 컨텐츠를 업로드할 수 있다.The device 100 may identify one or more personal information included in content input from the user. For example, the device 100 may obtain text data corresponding to text data and/or voice data on a plurality of frames constituting content. The device 100 may identify personal information on the acquired text data based on an artificial intelligence algorithm, filter/de-identify the identified personal information, and then upload content.

이와 관련된 동작은 도 2 내지 도 4를 참조하여 구체적으로 설명하도록 한다.Operations related to this will be described in detail with reference to FIGS. 2 to 4.

도 2는 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 장치의 구성을 도시한 블록도이다.Figure 2 is a block diagram showing the configuration of a device for preventing personal information exposure based on an artificial intelligence algorithm, according to an embodiment of the present disclosure.

도 2에 도시된 바와 같이, 장치(100)는 메모리(110), 통신 모듈(120), 디스플레이(130), 및 프로세서(140)를 포함할 수 있다.As shown in FIG. 2 , device 100 may include memory 110, communication module 120, display 130, and processor 140.

메모리(110)는 프로세서(140)가 각종 동작을 수행하기 위한 하나 이상의 인스트럭션(instruction)을 저장할 수 있다. 메모리(110)는 장치(100)의 다양한 기능을 지원하는 데이터와, 프로세서(140)의 동작을 위한 프로그램을 저장할 수 있고, 입/출력되는 데이터들(예를 들어, 사용자로부터 입력된 컨텐츠, 컨텐츠에 포함된 텍스트 데이터 등)을 저장할 수 있다.The memory 110 may store one or more instructions for the processor 140 to perform various operations. The memory 110 can store data supporting various functions of the device 100 and a program for the operation of the processor 140, and can store input/output data (e.g., content input from the user, content (text data included, etc.) can be saved.

메모리(110)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), SSD 타입(Solid State Disk type), SDD 타입(Silicon Disk Drive type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(random access memory; RAM), SRAM(static random access memory), 롬(read-only memory; ROM), EEPROM(electrically erasable programmable read-only memory), PROM(programmable read-only memory), 자기 메모리, 자기 디스크 및 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다.The memory 110 is a flash memory type, hard disk type, solid state disk type, SDD type (Silicon Disk Drive type), and multimedia card micro type. ), card-type memory (e.g. SD or XD memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), EEPROM (electrically erasable programmable read) -only memory), PROM (programmable read-only memory), magnetic memory, magnetic disk, and optical disk may include at least one type of storage medium.

통신 모듈(120)는 외부 장치(예로, 데이터 서버 또는/및 사용자가 이용하는 단말 장치 등)와의 통신이 가능하게 하는 회로를 포함하는 하나 이상의 구성 요소를 포함할 수 있다. 예를 들어, 통신 모듈(120)는 방송 수신 모듈, 유선통신 모듈, 무선통신 모듈, 근거리 통신 모듈, 위치정보 모듈 중 적어도 하나를 포함할 수 있다. 통신 모듈(120)는 외부 장치로부터 모니터링 데이터를 수신할 수 있다.The communication module 120 may include one or more components including circuits that enable communication with external devices (eg, data servers and/or terminal devices used by users, etc.). For example, the communication module 120 may include at least one of a broadcast reception module, a wired communication module, a wireless communication module, a short-range communication module, and a location information module. The communication module 120 may receive monitoring data from an external device.

디스플레이(130)는 장치(100)에서 처리되는 정보(예를 들어, 컨텐츠에 포함된 텍스트 데이터 상의 개인 정보 등)를 표시(출력)한다. The display 130 displays (outputs) information processed by the device 100 (for example, personal information on text data included in content, etc.).

예를 들어, 디스플레이는 본 장치(100)에서 구동되는 응용 프로그램의 실행화면 정보, 또는 이러한 실행화면 정보에 따른 UI(User Interface), GUI(Graphic User Interface) 정보를 표시할 수 있다.For example, the display may display execution screen information of an application running on the device 100, or UI (User Interface) and GUI (Graphic User Interface) information according to the execution screen information.

프로세서(140)는 메모리(110)에 저장된 하나 이상의 인스트럭션(instruction)을 실행함으로써 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지 서비스를 제공할 수 있다. 즉, 프로세서(140)는 장치(100)의 각 구성 요소를 이용하여 전반적인 동작 및 기능을 제어할 수 있다. 이때, 메모리(110)와 프로세서(140)는 각각 별개의 칩으로 구현될 수 있다. 또는, 메모리(110)와 프로세서(140)는 단일 칩으로 구현될 수도 있다.The processor 140 may provide a service to prevent personal information exposure based on an artificial intelligence algorithm by executing one or more instructions stored in the memory 110. That is, the processor 140 can control overall operations and functions using each component of the device 100. At this time, the memory 110 and the processor 140 may each be implemented as separate chips. Alternatively, the memory 110 and processor 140 may be implemented as a single chip.

프로세서(140)는 AI 모델 학습 모듈(140-1) 및 개인 정보 필터링 모듈(140-2)을 포함할 수 있다. AI 모델 학습 모듈(140-1) 및 개인 정보 필터링 모듈(140-2)은 프로세서(140)에 의해 제어 및 관리될 수 있다.The processor 140 may include an AI model learning module 140-1 and a personal information filtering module 140-2. The AI model learning module 140-1 and the personal information filtering module 140-2 may be controlled and managed by the processor 140.

AI 모델 학습 모듈(140-1)은 제1 AI 모델 및 제2 AI 모델을 학습시킬 수 있다. The AI model learning module 140-1 can train a first AI model and a second AI model.

구체적으로, AI 모델 학습 모듈(140-1)은 입력된 텍스트 데이터 상에 포함된 하나 이상의 개인 정보와 관련된 정보를 출력하도록 제1 AI 모델을 학습시킬 수 있다. Specifically, the AI model learning module 140-1 may train the first AI model to output information related to one or more personal information included in the input text data.

여기서, 하나 이상의 개인 정보와 관련된 정보는 하나 이상의 개인 정보 각각의 유형 및 중요도 레벨을 포함할 수 있다. 개인 정보의 중요도 레벨은 개인 정보의 유형 별로 미리 책정될 수 있다. 특정 개인 정보의 유형이 주민등록 번호 등과 같이 민감한 데이터일 경우, 특정 개인 정보의 유형의 중요도 레벨은 높게 책정될 수 있다.Here, the information related to one or more personal information may include the type and importance level of each one or more personal information. The importance level of personal information can be set in advance for each type of personal information. If the type of specific personal information is sensitive data such as a resident registration number, the importance level of the type of specific personal information may be set high.

일 예로, AI 모델 학습 모듈(140-1)은 코퍼스(corpus) 데이터 및 상기 코퍼스 데이터 상에 포함된 개인 정보에 대해 라벨링(labeling)한 데이터로 구성된 학습 데이터에 기초하여 제1 AI 모델을 학습시킬 수 있다.As an example, the AI model learning module 140-1 may learn a first AI model based on training data consisting of corpus data and data labeling personal information included in the corpus data. You can.

AI 모델 학습 모듈(140-1)은 입력된 컨텐츠 상에 포함된 텍스트 데이터를 추출하도록 제2 AI 모델을 학습시킬 수 있다. 제2 AI 모델은 컨텐츠를 구성하는 복수의 이미지 프레임 각각에 표시된 복수의 텍스트를 식별하고, 식별된 복수의 텍스트 중 중복되는 텍스트를 제외한 나머지 텍스트를 출력하도록 학습될 수 있다.The AI model learning module 140-1 may train a second AI model to extract text data included in the input content. The second AI model may be trained to identify a plurality of texts displayed in each of a plurality of image frames constituting the content and output the remaining texts excluding overlapping texts among the identified plurality of texts.

일 예로, 제2 AI 모델은 도 3에 도시된 바와 같이 구성될 수 있다. 제2 AI 모델은 하나 이상의 컨볼루션 레이어(convolutional layer), 하나 이상의 LSTM(long short term memory) 레이어, 및 소프트 맥스(softmax) 레이어 중의 적어도 하나를 포함할 수 있다.As an example, the second AI model may be configured as shown in FIG. 3. The second AI model may include at least one of one or more convolutional layers, one or more long short term memory (LSTM) layers, and a softmax layer.

컨텐츠를 구성하는 복수의 이미지 프레임은 하나 이상의 컨볼루션 레이어에 입력됨에 따라 특징 데이터가 추출될 수 있다. 특징 데이터는 하나 이상의 LSTM 레이어에 입력되어 복수의 이미지 프레임 각각에 포함된 텍스트가 추출/식별되고, 추출/식별된 텍스트 중 중복된 텍스트를 제외한 나머지 텍스트가 소프트 맥스 레이어를 통해 출력될 수 있다.Feature data may be extracted from a plurality of image frames constituting content as they are input to one or more convolutional layers. Feature data is input to one or more LSTM layers, and the text included in each of a plurality of image frames is extracted/identified, and the remaining text excluding the duplicate text among the extracted/identified texts can be output through a soft max layer.

개인 정보 필터링 모듈(140-2)은 AI 모델 학습 모듈(140-1)을 통해 학습된 AI 모델을 활용하여 컨텐츠 상에 포함된 개인 정보를 필터링/비식별화할 수 있다. The personal information filtering module 140-2 can filter/de-identify personal information included in content using an AI model learned through the AI model learning module 140-1.

개인 정보 필터링 모듈(140-2)은 사용자로부터 입력된 컨텐츠에 포함된 텍스트 데이터를 제1 AI 모델에 입력하여 텍스트 데이터에 포함된 하나 이상의 개인 정보와 관련된 정보를 획득할 수 있다.The personal information filtering module 140-2 may input text data included in content input from the user into the first AI model to obtain information related to one or more personal information included in the text data.

구체적으로, 개인 정보 필터링 모듈(140-2)은 제2 AI 모델에 컨텐츠(즉, 컨텐츠를 구성하는 복수의 이미지 프레임)을 입력하여 컨텐츠에 포함된 텍스트 데이터를 추출할 수 있다. 그리고, 개인 정보 필터링 모듈(140-2)은 추출한 텍스트 데이터를 제1 AI 모델에 입력하여 텍스트 데이터에 포함된 개인 정보와 관련된 정보를 획득할 수 있다.Specifically, the personal information filtering module 140-2 may input content (i.e., a plurality of image frames constituting the content) into the second AI model and extract text data included in the content. Additionally, the personal information filtering module 140-2 may input the extracted text data into the first AI model to obtain information related to personal information included in the text data.

추가적으로 또는 대안적으로, 개인 정보 필터링 모듈(140-2)은 컨텐츠에 포함된 음성 데이터를 추출하고, 추출된 음성 데이터에 대해 STT(speech-to-text) 알고리즘을 적용하여 음성 데이터에 대응되는 텍스트 데이터를 획득할 수 있다. 그리고, 개인 정보 필터링 모듈(140-2)은 음성 데이터에 대응되는 텍스트 데이터를 제1 AI 모델에 입력하여 해당 텍스트 데이터에 포함된 개인 정보와 관련된 정보를 획득할 수 있다.Additionally or alternatively, the personal information filtering module 140-2 extracts voice data included in the content and applies a speech-to-text (STT) algorithm to the extracted voice data to provide text corresponding to the voice data. Data can be obtained. Additionally, the personal information filtering module 140-2 may input text data corresponding to voice data into the first AI model to obtain information related to personal information included in the text data.

개인 정보 필터링 모듈(140-2)은 컨텐츠에 포함된 하나 이상의 개인 정보를 비식별화한 후 컨텐츠를 업로드할 수 있다. 즉, 사용자로부터 컨텐츠가 입력되면, 개인 정보 필터링 모듈(140-2)은 컨텐츠 상에 포함된 개인 정보를 비식별화한 후 해당 컨텐츠를 업로드할 수 있다.The personal information filtering module 140-2 may upload the content after de-identifying one or more personal information included in the content. That is, when content is input from a user, the personal information filtering module 140-2 may de-identify personal information included in the content and then upload the content.

개인 정보 필터링 모듈(140-2)은 사용자가 이용하는 단말 장치로 컨텐츠 상에서 필터링한/비식별화한 개인 정보를 알리는 메시지를 전송할 수 있다. 사용자가 의도적으로 노출한 개인 정보(예로, 전화 번호 등)를 비식별화한 경우를 방지하기 위하여, 개인 정보 필터링 모듈(140-2)은 사용자가 이용하는 단말 장치로 컨텐츠 상에서 필터링한/비식별화한 개인 정보를 알리는 메시지를 전송할 수 있다.The personal information filtering module 140-2 may transmit a message informing the user of the filtered/de-identified personal information on the content to the terminal device used by the user. In order to prevent cases where personal information (e.g., phone number, etc.) intentionally exposed by a user is de-identified, the personal information filtering module 140-2 filters/de-identifies content on the terminal device used by the user. A message may be sent containing personal information.

사용자가 이용하는 단말 장치로부터 특정 유형의 개인 정보의 비식별화 해제를 요청하는 제2 메시지를 수신함에 기반하여, 개인 정보 필터링 모듈(140-2)은 컨텐츠 상의 하나 이상의 개인 정보 중 특정 유형의 개인 정보에 대해 비식별화 해제를 수행할 수 있다. Based on receiving a second message requesting de-identification of a specific type of personal information from the terminal device used by the user, the personal information filtering module 140-2 determines the specific type of personal information among one or more pieces of personal information in the content. De-identification can be performed on .

이후에 사용자로부터 다른 컨텐츠가 전송되었을 때, 개인 정보 필터링 모듈(140-2)은 다른 컨텐츠 상에 포함된 개인 정보 중 특정 유형의 개인 정보를 비식별화하지 않을 수 있다.When other content is later transmitted from the user, the personal information filtering module 140-2 may not de-identify specific types of personal information included in the other content.

그리고, 사용자가 이용하는 단말 장치로부터 컨텐츠에 포함된 텍스트에 포함된 특정 단어가 개인 정보라는 메시지를 수신함에 기반하여, 개인 정보 필터링 모듈(140-2)은 컨텐츠 상에서 특정 단어를 비식별화할 수 있다. 그리고, AI 모델 학습 모듈(140-1)은 제1 AI 모델이 특정 단어를 개인 정보로서 식별할 수 있도록 제1 AI 모델을 추가 학습시킬 수 있다.And, based on receiving a message from the terminal device used by the user that a specific word included in the text included in the content is personal information, the personal information filtering module 140-2 may de-identify the specific word in the content. Additionally, the AI model learning module 140-1 may further train the first AI model so that the first AI model can identify a specific word as personal information.

도 4는 본 개시의 일 실시예에 따른, 인공지능 알고리즘에 기초하여 개인 정보 노출을 방지하는 방법을 설명하기 위한 순서도이다.FIG. 4 is a flowchart illustrating a method of preventing personal information exposure based on an artificial intelligence algorithm according to an embodiment of the present disclosure.

장치는 사용자가 이용하는 단말 장치로부터 제1 컨텐츠가 전송됨에 기반하여, 제1 컨텐츠에 포함된 제1 텍스트 데이터 또는 제1 음성 데이터 중의 적어도 하나를 추출할 수 있다(S410).The device may extract at least one of first text data or first voice data included in the first content based on the first content being transmitted from the terminal device used by the user (S410).

구체적으로, 장치는 제1 컨텐츠(즉, 제1 컨텐츠를 구성하는 복수의 이미지 프레임)을 제2 AI 모델에 입력하여 제1 컨텐츠에 포함된 제1 텍스트 데이터를 획득할 수 있다.Specifically, the device may obtain first text data included in the first content by inputting the first content (that is, a plurality of image frames constituting the first content) into the second AI model.

추가적으로 또는 대안적으로, 장치는 제1 컨텐츠 상에서 제1 음성 데이터를 추출하고, 추출된 제1 음성 데이터에 대해 STT 알고리즘을 적용하여 제2 텍스트 데이터를 획득할 수 있다.Additionally or alternatively, the device may extract first voice data from the first content and obtain second text data by applying an STT algorithm to the extracted first voice data.

장치는 제1 텍스트 데이터 및 제1 음성 데이터에 대응되는 제2 텍스트 데이터 중의 적어도 하나를 미리 학습된 제1 AI 모델에 입력하여 제1 텍스트 데이터 또는 제2 텍스트 데이터에 포함된 하나 이상의 개인 정보와 관련된 정보를 획득할 수 있다(S420).The device inputs at least one of the first text data and the second text data corresponding to the first voice data into the pre-trained first AI model to relate to one or more personal information included in the first text data or the second text data. Information can be obtained (S420).

장치는 제1 컨텐츠에 포함된 제1 텍스트 데이터 또는 제1 음성 데이터 중의 적어도 하나에서 하나 이상의 개인 정보를 비식별화한 후 제1 컨텐츠를 업로드할 수 있다(S430).The device may upload the first content after de-identifying one or more personal information in at least one of the first text data or the first voice data included in the first content (S430).

구체적으로, 장치는 제1 컨텐츠 상의 복수의 이미지 프레임 상에 적어도 하나의 개인 정보가 표시된 영역을 블러(blur) 처리 또는/및 모자이크 처리할 수 있다. 또 다른 예로, 장치는 제1 컨텐츠 상의 복수의 이미지 프레임 상에 표시된 적어도 하나의 개인 정보를 미리 정의된 문자로 변형할 수 있다.Specifically, the device may blur and/or mosaic an area where at least one piece of personal information is displayed on a plurality of image frames on the first content. As another example, the device may transform at least one piece of personal information displayed on a plurality of image frames on the first content into predefined characters.

추가적으로 또는 대안적으로, 장치는 제1 컨텐츠가 재생될 때 개인 정보가 포함된 음성이 출력되는 시간에 미리 정의된 음성이 삽입될 수 있다. 또는, 장치는 제1 컨텐츠가 재생될 때 개인 정보가 포함된 음성이 출력되는 시간에 해당 음성을 변조할 수 있다.Additionally or alternatively, the device may insert a predefined voice at the time the voice containing personal information is output when the first content is played. Alternatively, the device may modulate the voice containing personal information at the time the voice is output when the first content is played.

장치는 상술된 방식과 같이 하나 이상의 개인 정보를 비식별화한 제1 컨텐츠를 업로드할 수 있다.The device may upload first content containing one or more de-identified personal information in the manner described above.

장치는 사용자가 이용하는 단말 장치로 제1 컨텐츠 상에서 필터링한/비식별화한 하나 이상의 개인 정보를 알리는 제1 메시지를 전송할 수 있다(S440).The device may transmit a first message notifying one or more filtered/de-identified personal information on the first content to the terminal device used by the user (S440).

즉, 장치는 사용자에게 개인 정보를 모두 비식별화한 후 제1 컨텐츠를 업로드함으로써 사용자의 개인 정보 노출을 방지했음을 알릴 수 있다. 또한, 장치는 사용자의 의도와 다르게 (즉, 불필요하게) 개인 정보를 비식별화했는지 여부를 문의할 수 있다.That is, the device can inform the user that exposure of the user's personal information has been prevented by de-identifying all personal information and then uploading the first content. Additionally, the device may inquire whether personal information has been de-identified differently than the user intended (i.e., unnecessarily).

사용자가 이용하는 단말 장치로부터 제1 유형의 개인 정보의 비식별화 해제를 요청하는 제2 메시지를 수신함에 기반하여, 장치는 제1 컨텐츠 상의 하나 이상의 개인 정보 중 제1 유형의 개인 정보에 대해 비식별화 해제를 수행할 수 있다(S450).Based on receiving a second message requesting de-identification of the first type of personal information from the terminal device used by the user, the device de-identifies the first type of personal information among one or more pieces of personal information on the first content. Disarming can be performed (S450).

이후, 장치는 사용자로부터 수신된 컨텐츠에서 제1 유형의 개인 정보는 비식별화하지 않을 수 있다.Thereafter, the device may de-identify the first type of personal information in the content received from the user.

구체적으로, 사용자가 이용하는 단말 장치로부터 제2 컨텐츠가 수신됨에 기반하여, 장치는 제2 컨텐츠에 포함된 제3 텍스트 데이터 및 제2 음성 데이터를 추출할 수 있다. 장치는 제3 텍스트 데이터 및 제2 음성 데이터에 대응되는 제4 텍스트 데이터 중의 적어도 하나를 제1 AI 모델에 입력하여 제3 텍스트 데이터 또는 제4 텍스트 데이터에 포함된 하나 이상의 개인 정보와 관련된 정보를 획득할 수 있다.Specifically, based on the second content being received from the terminal device used by the user, the device may extract third text data and second voice data included in the second content. The device inputs at least one of the third text data and the fourth text data corresponding to the second voice data into the first AI model to obtain information related to one or more personal information included in the third text data or the fourth text data. can do.

장치는 제2 컨텐츠에 포함된 제3 텍스트 데이터 또는 제2 음성 데이터 중의 적어도 하나에서 제1 유형의 개인 정보를 제외한 나머지 개인 정보를 비식별화한 후 제2 컨텐츠를 업로드할 수 있다.The device may upload the second content after de-identifying the remaining personal information excluding the first type of personal information from at least one of the third text data or the second voice data included in the second content.

본 개시의 또 다른 예로, 사용자가 이용하는 단말 장치로부터 제1 텍스트 데이터 또는 제1 음성 데이터 중의 적어도 하나에 포함된 특정 단어가 제2 유형의 개인 정보라는 제3 메시지를 수신함에 기반하여, 장치는 제1 컨텐츠 상에서 특정 단어를 비식별화할 수 있다. 그리고, 장치는 제3 메시지에 기초하여 제1 AI 모델을 추가 학습시킬 수 있다.As another example of the present disclosure, based on receiving a third message that a specific word included in at least one of the first text data or the first voice data is the second type of personal information from the terminal device used by the user, the device may provide the second type of personal information. 1 Specific words in content can be de-identified. And, the device can additionally learn the first AI model based on the third message.

즉, 장치는 제1 AI 모델이 특정 단어를 제2 유형의 개인 정보로 식별하도록 제1 AI 모델을 추가 학습시킬 수 있다.That is, the device may further train the first AI model so that the first AI model identifies a specific word as the second type of personal information.

한편, 개시된 실시예들은 컴퓨터에 의해 실행 가능한 명령어를 저장하는 기록매체의 형태로 구현될 수 있다. 명령어는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 프로그램 모듈을 생성하여 개시된 실시예들의 동작을 수행할 수 있다. 기록매체는 컴퓨터로 읽을 수 있는 기록매체로 구현될 수 있다.Meanwhile, the disclosed embodiments may be implemented in the form of a recording medium that stores instructions executable by a computer. Instructions may be stored in the form of program code, and when executed by a processor, may create program modules to perform operations of the disclosed embodiments. The recording medium may be implemented as a computer-readable recording medium.

컴퓨터가 읽을 수 있는 기록매체로는 컴퓨터에 의하여 해독될 수 있는 명령어가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래쉬 메모리, 광 데이터 저장장치 등이 있을 수 있다. Computer-readable recording media include all types of recording media storing instructions that can be decoded by a computer. For example, there may be Read Only Memory (ROM), Random Access Memory (RAM), magnetic tape, magnetic disk, flash memory, optical data storage device, etc.

이상에서와 같이 첨부된 도면을 참조하여 개시된 실시예들을 설명하였다. 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자는 본 개시의 기술적 사상이나 필수적인 특징을 변경하지 않고도, 개시된 실시예들과 다른 형태로 본 개시가 실시될 수 있음을 이해할 것이다. 개시된 실시예들은 예시적인 것이며, 한정적으로 해석되어서는 안 된다.As described above, the disclosed embodiments have been described with reference to the attached drawings. A person skilled in the art to which this disclosure pertains will understand that the present disclosure may be practiced in forms different from the disclosed embodiments without changing the technical idea or essential features of the present disclosure. The disclosed embodiments are illustrative and should not be construed as limiting.

100 : 장치
110 : 메모리
120 : 통신 모듈
130 : 디스플레이
140 : 프로세서100: device
110: memory
120: communication module
130: display
140: processor

Claims

In a method of providing a service that prevents personal information exposure based on an artificial intelligence (AI) algorithm performed by a device, the method includes:
Based on the first content being transmitted from the terminal device used by the user, extracting at least one of first text data or first voice data included in the first content;
At least one of the first text data and the second text data corresponding to the first voice data is input into a pre-trained first AI model, and one or more personal information included in the first text data or the second text data Obtaining information related to;
De-identifying the one or more personal information in at least one of the first text data or the first voice data included in the first content and then uploading the first content;
transmitting a first message notifying the one or more pieces of personal information de-identified in the first content to a terminal device used by the user; and
Based on receiving a second message requesting de-identification of the first type of personal information from the terminal device used by the user, the first type of personal information among the one or more personal information on the first content Including performing de-identification,
extracting third text data and second voice data included in the second content based on the second content being received from the terminal device used by the user;
At least one of the third text data and the fourth text data corresponding to the second voice data is input into the first AI model to relate to one or more personal information included in the third text data or the fourth text data. Obtaining information; and
A step of de-identifying remaining personal information excluding the first type of personal information from at least one of the third text data or the second voice data included in the second content and then uploading the second content. Including, method.

According to paragraph 1,
The method further comprising obtaining the second text data by applying a speech-to-text (STT) algorithm to the first voice data extracted from the first content.

According to paragraph 1,
Based on the plurality of image frames constituting the first content being input to the pre-trained second AI model, the second AI model is:
Identifying a plurality of texts displayed in each of the plurality of image frames,
A method of outputting the remaining texts excluding overlapping texts among the plurality of texts as the first text data.

According to paragraph 3,
The second AI model is,
A method comprising at least one of one or more convolutional layers, one or more long short term memory (LSTM) layers, and a softmax layer.

delete

According to paragraph 1,
Based on receiving a third message from the terminal device used by the user that a specific word included in at least one of the first text data or the first voice data is a second type of personal information, the specific word on the first content de-identifying words; and
The method further comprises additionally learning the first AI model based on the third message.

In a device that provides a service that prevents personal information exposure based on an artificial intelligence (AI) algorithm, the device:
at least one memory; and
Contains at least one processor,
The at least one processor,
Based on the first content being transmitted from the terminal device used by the user, extracting at least one of first text data or first voice data included in the first content;
At least one of the first text data and the second text data corresponding to the first voice data is input into a pre-trained first AI model, and one or more personal information included in the first text data or the second text data Obtain information related to;
De-identifying the one or more personal information in at least one of the first text data or the first voice data included in the first content and then uploading the first content;
transmitting a first message informing the terminal device used by the user of the one or more pieces of de-identified personal information in the first content; and
Based on receiving a second message requesting de-identification of the first type of personal information from the terminal device used by the user, the first type of personal information among the one or more personal information on the first content a personal information filtering module that performs de-identification;
The personal information filtering module is,
extracting third text data and second voice data included in the second content based on the second content being received from the terminal device used by the user;
At least one of the third text data and the fourth text data corresponding to the second voice data is input into the first AI model to relate to one or more personal information included in the third text data or the fourth text data. obtain information; and
A device that uploads the second content after de-identifying remaining personal information excluding the first type of personal information from at least one of the third text data or the second voice data included in the second content.

In clause 7,
The personal information filtering module is,
A device that obtains the second text data by applying a speech-to-text (STT) algorithm to the first voice data extracted from the first content.

In clause 7,
Based on the plurality of image frames constituting the first content being input to the pre-trained second AI model, the second AI model is:
Identifying a plurality of texts displayed in each of the plurality of image frames,
A device that outputs the remaining texts excluding overlapping texts among the plurality of texts as the first text data.

According to clause 9,
The second AI model is,
An apparatus comprising at least one of one or more convolutional layers, one or more long short term memory (LSTM) layers, and a softmax layer.

delete

In clause 7,
The personal information filtering module is,
Based on receiving a third message from the terminal device used by the user that a specific word included in at least one of the first text data or the first voice data is a second type of personal information, the specific word on the first content De-identify words,
The AI model learning module included in the at least one processor,
A device for further learning the first AI model based on the third message.