KR102576358B1

KR102576358B1 - Learning data generating device for sign language translation and method of operation thereof

Info

Publication number: KR102576358B1
Application number: KR1020220182743A
Authority: KR
Inventors: 김종화; 조남제
Original assignee: 주식회사 케이엘큐브
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-09-11

Abstract

본 개시의 일 실시 예에 따른 인공지능 기반의 수어 번역을 위한 학습데이터를 생성하는 학습데이터 생성 장치의 동작 방법은 국어문으로 구성된 제1 텍스트 데이터에 대해 수어문으로 구성된 제2 텍스트 데이터로의 번역을 요청하는 단계, 상기 제2 텍스트 데이터를 수신하면, 상기 제2 텍스트 데이터로부터 하나 이상의 의미 요소들을 획득하는 단계, 상기 제1 텍스트 데이터에 대한 수어 영상 데이터를 획득하는 단계, 상기 하나 이상의 의미 요소들 및 상기 수어 영상 데이터를 기반으로 학습데이터를 생성하는 단계, 및 상기 학습데이터를 저장하는 단계를 포함한다.A method of operating a learning data generating device that generates learning data for artificial intelligence-based sign language translation according to an embodiment of the present disclosure is to translate first text data composed of Korean sentences into second text data composed of sign language sentences. requesting, upon receiving the second text data, obtaining one or more semantic elements from the second text data, obtaining sign language image data for the first text data, the one or more semantic elements and generating learning data based on the sign language image data, and storing the learning data.

Description

Learning data generation device for sign language translation and its operation method {LEARNING DATA GENERATING DEVICE FOR SIGN LANGUAGE TRANSLATION AND METHOD OF OPERATION THEREOF}

본 개시는 학습데이터 생성 장치 및 그의 동작 방법에 관한 것으로, 보다 상세하게는, 인공지능 학습을 위한 수어 번역용 학습데이터를 생성하는 학습데이터 생성 장치 및 그의 동작 방법에 관한 것이다.The present disclosure relates to a learning data generating device and an operating method thereof, and more specifically, to a learning data generating device that generates learning data for sign language translation for artificial intelligence learning and an operating method thereof.

수어(手語, Sign language)는 농인에게 의사를 전달하고자 할 때 사용되는 가장 대표적인 방식으로 제스처(gesture)를 통해 의사를 전달하는 방법을 말한다. 수어는 독자적인 문법을 갖고, 정해진 의미를 갖는 제스처를 연속적으로 수행하여 문장을 표현한다. Sign language is the most representative method used when communicating to a deaf person and refers to a method of communicating through gestures. Sign language has its own grammar and expresses sentences by continuously performing gestures with a given meaning.

수어의 문법은 음성과 문자로 표현되는 음성 언어(vocal language)의 문법과 다르다. 따라서, 청인이 수어로 불편함 없이 의사를 표현하기 위해서는 전문 교육을 장시간 이수해야하기 때문에, 청인의 대다수가 수어를 이용하여 농인과 의사소통하는 것에 어려움을 겪는다. 이러한 의사 전달의 어려움을 해결하기 위해서는 청인에게 익숙한 음성 언어를 농인에게 익숙한 수어로 변환해주는 기술(이하, 수어 번역 기술)이 필수적으로 요구되고, 최근에는 인공지능을 통한 수어 통역/번역 기술에 대한 연구가 활발히 진행되고 있다.The grammar of sign language is different from the grammar of vocal language expressed through voice and text. Therefore, in order for hearing people to express themselves without discomfort in sign language, they must complete a long period of specialized training, so the majority of hearing people have difficulty communicating with deaf people using sign language. In order to solve these communication difficulties, technology that converts spoken language familiar to hearing people into sign language familiar to deaf people (hereinafter referred to as sign language translation technology) is essential, and recently, research has been conducted on sign language interpretation/translation technology using artificial intelligence. is actively underway.

머신 러닝(machine learning) 또는 기계 학습은 인공지능의 한 분야로, 컴퓨터에 미리 준비된 학습데이터를 훈련시켜, 훈련된 지식을 기반으로 새로운 입력에 대하여 적절한 답을 찾고자 하는 일련의 과정이라 할 수 있다. 이때, 컴퓨터를 훈련시키는 학습데이터가 질문(training input)과 정답(training output)이 모두 주어진 경우, 레이블링(labeling) 되어 있다고 한다. Machine learning, or machine learning, is a field of artificial intelligence. It can be said to be a series of processes that train a computer with pre-prepared learning data and seek appropriate answers to new inputs based on the trained knowledge. At this time, if the learning data used to train the computer is given both a question (training input) and a correct answer (training output), it is said to be labeled.

한편, 머신 러닝을 기반으로 수어 번역(또는 통역, 이하 동일)을 하는 경우, 특징 추출 및 학습 알고리즘과 함께 중요한 것이 레이블된 학습데이터의 수집에 있으며, 레이블된 학습데이터가 많이 제공되면 될수록, 학습은 더 효과적으로 진행될 수 있다. 이를 위해서, 수천에서 수만 건의 레이블된 학습데이터가 필요하지만, 레이블된 학습데이터는 일반적으로 수동 작업으로 만들어지고 있는 실정이므로, 방대한 양의 레이블된 학습데이터를 구하는 것은 쉽지 않다. 따라서, 머신 러닝의 효과적인 학습을 위해서는, 효율적으로 레이블된 학습데이터를 생성하는 방안이 필요하다.On the other hand, when performing sign language translation (or interpretation, hereinafter the same) based on machine learning, the important thing in addition to feature extraction and learning algorithms is the collection of labeled learning data. The more labeled learning data is provided, the more important the learning becomes. It can be done more effectively. For this purpose, thousands to tens of thousands of labeled training data are needed, but since labeled training data is generally created manually, it is not easy to obtain a large amount of labeled training data. Therefore, for effective machine learning, a method for generating efficiently labeled learning data is needed.

본 개시는 수어 통역/번역을 원활하게 할 수 있는 인공지능 모델을 확보하기 위해 필요한 학습데이터를 효율적으로 생성할 수 있는 학습데이터 생성 장치 및 그의 동작 방법를 제공하는 것을 목적으로 한다.The purpose of the present disclosure is to provide a learning data generation device and a method of operating the same that can efficiently generate the learning data necessary to secure an artificial intelligence model that can smoothly interpret/translate sign language.

실시 예에 따라, 상기 하나 이상의 의미 요소들을 획득하는 단계는 상기 제2 텍스트 데이터에 대한 형태소 분석을 통해 상기 제2 텍스트 데이터를 수어의 최소 의미 단위인 상기 하나 이상의 의미 요소들로 분리하는 단계를 더 포함할 수 있다.Depending on the embodiment, the step of obtaining the one or more semantic elements further includes separating the second text data into the one or more semantic elements that are the minimum semantic units of a sign language through morphological analysis of the second text data. It can be included.

실시 예에 따라, 상기 수어 영상 데이터를 획득하는 단계는 카메라를 포함하는 촬영 장치에 상기 제1 텍스트 데이터 및 상기 제2 텍스트 데이터를 제공하는 단계, 및 상기 촬영 장치로부터 상기 수어 영상 데이터를 수신하는 단계를 더 포함할 수 있다.Depending on the embodiment, the step of acquiring the sign language image data includes providing the first text data and the second text data to a photographing device including a camera, and receiving the sign language image data from the photographing device. It may further include.

실시 예에 따라, 상기 학습데이터를 생성하는 단계는 작업자 단말에 상기 하나 이상의 의미 요소들 및 상기 수어 영상 데이터를 전송하는 단계, 상기 작업자 단말로부터 타임라인을 기반으로 상기 하나 이상의 의미 요소들이 상기 수어 영상 데이터에 매칭된 맵핑 정보를 수신하는 단계, 및 상기 맵핑 정보를 기반으로 토큰 정보를 생성하는 단계를 더 포함할 수 있다.Depending on the embodiment, the step of generating the learning data includes transmitting the one or more semantic elements and the sign language image data to a worker terminal, and the one or more semantic elements are transmitted from the worker terminal to the sign language image based on a timeline. It may further include receiving mapping information matched to data, and generating token information based on the mapping information.

실시 예에 따라, 상기 학습데이터를 저장하는 단계는 상기 토큰 정보의 정확도가 기준 값 이상인 경우 입력이 상기 제1 텍스트 데이터이고 정답이 상기 하나 이상의 의미 요소들의 시퀀스인 상기 학습데이터를 기반으로 학습용 데이터베이스를 생성하는 단계를 더 포함할 수 있다.According to an embodiment, the step of storing the learning data includes, when the accuracy of the token information is greater than or equal to the reference value, a learning database is created based on the learning data in which the input is the first text data and the correct answer is the sequence of the one or more semantic elements. A generating step may be further included.

본 개시의 일 실시 예에 따른 학습데이터 생성 장치는 외부 장치들과 데이터를 송수신하도록 구성된 인터페이스, 수어 영상 데이터를 획득하도록 구성된 센서, 및 프로세서를 포함한다. 상기 프로세서는 상기 외부 장치들 중 수어문 번역 장치에 국어문으로 구성된 제1 텍스트 데이터에 대해 수어문으로 구성된 제2 텍스트 데이터로의 번역을 요청하고, 상기 제2 텍스트 데이터를 수신하면, 상기 제2 텍스트 데이터로부터 하나 이상의 의미 요소들을 획득하고, 상기 하나 이상의 의미 요소들 및 상기 수어 영상 데이터를 기반으로 학습데이터를 생성하고, 그리고 상기 학습데이터를 저장하여 데이터베이스를 생성하도록 구성된다.A learning data generating device according to an embodiment of the present disclosure includes an interface configured to transmit and receive data with external devices, a sensor configured to acquire sign language image data, and a processor. The processor requests the sign language translation device among the external devices to translate first text data composed of Korean sentences into second text data composed of sign language sentences, and upon receiving the second text data, the second It is configured to acquire one or more semantic elements from text data, generate learning data based on the one or more semantic elements and the sign language image data, and store the learning data to create a database.

본 개시의 일 실시 예에 따른 학습데이터 생성 장치 및 그의 동작 방법은 인공지능 기반의 수어 번역 시스템의 학습에 필요한 레이블된 학습데이터를 효율적으로 생성할 수 있다. 이에 의해, 머신 러닝을 효과적으로 진행할 수 있으며, 이를 통해 생성된 학습데이터를 이용하여 수어 번역 시스템의 활용성을 높일 수 있다.The learning data generation device and its operating method according to an embodiment of the present disclosure can efficiently generate labeled learning data necessary for learning an artificial intelligence-based sign language translation system. As a result, machine learning can be carried out effectively, and the usability of the sign language translation system can be increased by using the learning data generated through this.

도 1은 본 개시의 일 실시 예에 따른 수어 학습 시스템을 나타내는 블록도이다.
도 2는 도 1의 학습데이터 생성 장치를 나타내는 블록도이다.
도 3은 도 2의 프로세서를 나타내는 블록도이다.
도 4는 본 개시의 일 실시 예에 따른 학습데이터 생성 장치의 동작 방법을 나타내는 순서도이다.
도 5는 도 4의 S150 단계를 설명하기 위한 도면이다.
도 6a 및 도 6b는 도 5의 S410 단계를 설명하기 위한 도면이다.
도 7은 본 개시의 일 실시 예에 따른 학습데이터 생성 장치가 적용된 수어 번역 시스템을 보여주는 블록도이다.Figure 1 is a block diagram showing a sign language learning system according to an embodiment of the present disclosure.
FIG. 2 is a block diagram showing the learning data generating device of FIG. 1.
FIG. 3 is a block diagram showing the processor of FIG. 2.
Figure 4 is a flowchart showing a method of operating a learning data generating device according to an embodiment of the present disclosure.
Figure 5 is a diagram for explaining step S150 of Figure 4.
FIGS. 6A and 6B are diagrams for explaining step S410 of FIG. 5.
Figure 7 is a block diagram showing a sign language translation system to which a learning data generation device according to an embodiment of the present disclosure is applied.

이하에서, 본 발명의 기술 분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있을 정도로, 본 발명의 실시 예들이 명확하고 상세하게 기재될 것이다. 다만, 본 발명은 청구범위에 기재된 범위 안에서 여러 가지 상이한 형태로 구현될 수 있으므로 하기에 설명하는 실시 예들은 표현 여부에 불구하고 예시에 불과하다. 즉, 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있다.Hereinafter, embodiments of the present invention will be described clearly and in detail so that a person skilled in the art can easily practice the present invention. However, since the present invention can be implemented in various different forms within the scope described in the claims, the embodiments described below are merely examples, regardless of whether they are expressed or not. That is, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in this application are only used to describe specific embodiments and are not intended to limit the invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

이하, 첨부된 도면을 참조하여 본 발명의 실시 예에 대해 설명하면, 다음과 같다.Hereinafter, embodiments of the present invention will be described with reference to the attached drawings.

도 1은 본 개시의 일 실시 예에 따른 수어 학습 시스템을 나타내는 블록도이다. 수어 학습 시스템(10)은 수어 번역을 위한 머신 러닝에 사용되는 레이블된 학습데이터를 생성할 수 있다. 예를 들어, 수어 학습 시스템(10)은 국어문에 대한 수어문 및 국어문에 대한 수어 영상을 기반으로 데이터 라벨링 작업을 통해 학습데이터를 생성할 수 있다. 도 1을 참조하면, 수어 학습 시스템(10)은 학습데이터 생성 장치(100), 수어문 번역 장치(200), 수어 영상 촬영 장치(300) 및 작업자 단말(400)을 포함할 수 있다. Figure 1 is a block diagram showing a sign language learning system according to an embodiment of the present disclosure. The sign language learning system 10 can generate labeled learning data used in machine learning for sign language translation. For example, the sign language learning system 10 may generate learning data through a data labeling task based on sign language sentences for Korean sentences and sign language images for Korean sentences. Referring to FIG. 1, the sign language learning system 10 may include a learning data generating device 100, a sign language translation device 200, a sign language image capturing device 300, and a worker terminal 400.

학습데이터 생성 장치(100)는 수어 번역을 위한 국어문을 입력받을 수 있다. 국어문은 학습데이터 생성 장치(100)의 사용자로부터 UI를 통해 지정 혹은 입력되는 문장일 수 있으며, 학습데이터 생성 장치(100)의 방송신호 수신기에 탑재되어 운영될 경우 수신된 방송신호에서 분리되어 디코딩 처리된 자막 방송의 문장일 수 있으며, 통신망을 통해 연동 가능한 타 컴퓨터 혹은 서버 시스템에서 전송되는 번역 요청 문장일 수 있다. The learning data generating device 100 can receive Korean sentences for sign language translation. The Korean sentence may be a sentence specified or input through the UI by the user of the learning data generating device 100, and when mounted and operated in the broadcasting signal receiver of the learning data generating device 100, it is separated from the received broadcasting signal and decoded. It may be a sentence from a processed subtitle broadcast, or it may be a translation request sentence transmitted from another computer or server system that can be linked through a communication network.

학습데이터 생성 장치(100)는 국어문을 입력으로 하는 학습데이터를 생성하기 위한 일련의 동작들을 제어하도록 구성된다. 예를 들어, 학습데이터 생성 장치(100)는 수어문 번역 장치(200)에 국어문에 대한 수어문 번역을 의뢰할 수 있다. 학습데이터 생성 장치(100)는 수어 영상 촬영 장치(300)에 국어문에 대한 수어 영상 촬영을 요청할 수 있다. 학습데이터 생성 장치(100)는 수어문 및 수어 영상을 작업자 단말(400)에 제공하여 라벨링 작업을 제어할 수 있다. The learning data generating device 100 is configured to control a series of operations for generating learning data using Korean sentences as input. For example, the learning data generating device 100 may request the sign language translation device 200 to translate a sign language sentence into a Korean sentence. The learning data generating device 100 may request the sign language image capturing device 300 to capture a sign language image for a Korean sentence. The learning data generating device 100 can control the labeling task by providing sign language sentences and sign language images to the worker terminal 400.

실시 예에 따라, 학습데이터 생성 장치(100)는 네트워크(Network)를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터, 서버 또는 클라우드로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 즉, 학습데이터 생성 장치(100)는 네트워크(Network)를 통하여 수어문 번역 장치(200), 수어 영상 촬영 장치(300) 및 작업자 단말(400)과 통신할 수 있다. 학습데이터 생성 장치(100)에 관한 상세한 설명은 도 2에서 후술된다.Depending on the embodiment, the learning data generating device 100 may be implemented as a computer, server, or cloud that can access a remote server or terminal through a network. Here, the computer may include, for example, a laptop equipped with a navigation system and a web browser, a desktop, a laptop, etc. That is, the learning data generating device 100 can communicate with the sign language translation device 200, the sign language image capturing device 300, and the worker terminal 400 through a network. A detailed description of the learning data generating device 100 will be described later with reference to FIG. 2 .

네트워크(Network)는 수어 학습 시스템(10)은 학습데이터 생성 장치(100), 수어문 번역 장치(200), 수어 영상 촬영 장치(300) 및 작업자 단말(400)을 전기적으로 상호 연결할 수 있다. 네트워크(Network)는 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷(WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), 5GPP(5th Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), RF(Radio Frequency), 블루투스(Bluetooth) 네트워크, NFC(Near-Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.Network (Network) The sign language learning system 10 can electrically connect the learning data generating device 100, the sign language translation device 200, the sign language image capture device 300, and the worker terminal 400. Network refers to a connection structure that allows information exchange between nodes such as a plurality of terminals and servers. Examples of such networks include a local area network (LAN) and a wide area network (WAN). Wide Area Network, Internet (WWW: World Wide Web), wired and wireless data communication network, telephone network, wired and wireless television communication network, etc. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), 5th Generation Partnership Project (5GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), and Wi-Fi. , Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), RF (Radio Frequency), Bluetooth network, NFC ( It includes, but is not limited to, Near-Field Communication (Near-Field Communication) network, satellite broadcasting network, analog broadcasting network, and DMB (Digital Multimedia Broadcasting) network.

수어문 번역 장치(200)는 학습데이터 생성 장치(100)로부터 국어문을 수신하고, 국어문을 수어문으로 번역하도록 구성된다. 수어문 번역 장치(200)는 번역 전문가의 입력 신호를 기반으로 수어문 번역 작업을 수행할 수 있다. 국어문과 수어문을 구분하기 위해, 국어문은 제1 텍스트 데이터로, 수어문은 제2 텍스트 데이터로 명명될 수 있다. 예를 들어, 수어문 번역 장치(200)는 국어문으로 구성된 제1 텍스트 데이터를 수신하여 수어문으로 구성된 제2 텍스트 데이터로 변환할 수 있다.The sign language sentence translation device 200 is configured to receive a Korean sentence from the learning data generating device 100 and translate the Korean sentence into a sign language sentence. The sign language translation device 200 may perform a sign language translation task based on an input signal from a translation expert. In order to distinguish between Korean and signed language sentences, the Korean sentences may be referred to as first text data, and the signed language sentences may be referred to as second text data. For example, the sign language translation device 200 may receive first text data consisting of a Korean sentence and convert it into second text data consisting of a sign language sentence.

수어문은 국어문과 문법 체계가 다르기 때문에, 전문가의 수어문 번역 작업이 필요하다. 예를 들어, “나는 튼튼한 집을 샀다”의 문장을 그대로 일대일 번역한다면, “나+튼튼하다+집+사다”로 표현될 수 있고, 이는 한국어 문장에 익숙하지 않은 농인 입장에서 “나는 튼튼하고 (나는) 집을 샀다”고 이해할 수 있다. 따라서, 의도를 제대로 전달하려면, “나+집+사다, 집+강하다”로 어순에 변화를 주고 서술적 표현을 더하면 보다 더 확실한 의미 전달이 가능하다. Because the grammatical system of sign language sentences is different from that of Korean sentences, translation of sign language sentences by experts is necessary. For example, if the sentence “I bought a strong house” is directly translated one-to-one, it can be expressed as “I + strong + house + bought”, which from the perspective of a deaf person who is not familiar with Korean sentences can be interpreted as “I am strong and (I am ) can be understood as “I bought a house.” Therefore, in order to properly convey the intention, changing the word order to “I + house + buy, house + strong” and adding descriptive expressions will allow for a more clear conveyance of meaning.

수어문 번역 장치(200)는 제2 텍스트 데이터를 학습데이터 생성 장치(100)에 제공할 수 있다. 학습데이터 생성 장치(100)는 제1 텍스트 데이터 및 제2 텍스트 데이터를 수어 영상 촬영 장치(300)에 제공할 수 있다. The sign language translation device 200 may provide second text data to the learning data generation device 100. The learning data generating device 100 may provide first text data and second text data to the sign language image capturing device 300.

수어 영상 촬영 장치(300)는 국어문에 대한 수어 영상을 촬영하도록 구성될 수 있다. 예를 들어, 수어 영상 촬영 장치(300)는 카메라를 포함할 수 있고, 카메라를 통해 제1 텍스트 데이터에 대한 수어 영상 데이터를 생성할 수 있다. 수어 영상은 수어 전문가에 의해 촬영될 수 있고, 실시 예에 따라, 수어 전문가는 제2 텍스트 데이터를 참고하여 수어 영상을 생성할 수 있다. 실시 예에 따라, 수어 영상 촬영 장치(300)는 학습데이터 생성 장치(100)에 통합될 수 있다.The sign language image capturing device 300 may be configured to capture a sign language image of a Korean sentence. For example, the sign language image capturing device 300 may include a camera, and may generate sign language image data for the first text data through the camera. The sign language image may be captured by a sign language expert, and depending on the embodiment, the sign language expert may generate the sign language image by referring to the second text data. Depending on the embodiment, the sign language image capturing device 300 may be integrated into the learning data generating device 100.

작업자 단말(400)은 라벨링 작업을 수행하는 작업자의 모바일 단말을 포함할 수 있다. 작업자 단말(400)은 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.The worker terminal 400 may include a mobile terminal of a worker performing a labeling task. The worker terminal 400 is a wireless communication device that ensures portability and mobility, and includes navigation, Personal Communication System (PCS), Global System for Mobile communications (GSM), Personal Digital Cellular (PDC), Personal Handyphone System (PHS), and PDA. (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) terminal, smartphone , may include all types of handheld-based wireless communication devices such as smartpads, tablet PCs, etc.

작업자 단말(400)은 학습데이터 생성 장치(100)에서 제공하는 웹 서버 또는 어플리케이션을 통해 데이터 라벨링 작업을 수행할 수 있다. 예를 들어, 작업자 단말(400)은 작업자의 입력 신호를 기반으로 수어 영상 데이터에 수어의 최소 의미 단위인 의미 요소를 맵핑하여 토큰을 생성할 수 있다. 맵핑 완료된 토큰은 학습데이터 생성 장치(100)에서 검증을 통해 레이블된 학습데이터로 저장될 수 있다.The worker terminal 400 may perform data labeling work through a web server or application provided by the learning data generating device 100. For example, the worker terminal 400 may generate a token by mapping semantic elements, which are the minimum semantic units of a sign language, to sign language image data based on the worker's input signal. The mapped token may be verified by the learning data generating device 100 and stored as labeled learning data.

실시 예에 따라, 학습데이터 생성 장치(100)는 작업자 단말(400)에 라벨링 저작 도구를 제공할 수 있다. 라벨링 저작 도구는 수어 영상 데이터의 맵핑 작업을 효율적으로 수행할 수 있도록 제작된 사용자 인터페이스를 포함할 수 있다. 작업자는 작업자 단말(400)을 통해 라벨링 저작 도구를 이용할 수 있고, 간단한 조작만으로 토큰을 생성함으로써 기존의 저작 도구에 비해 시간 효율적으로 라벨링 작업을 수행할 수 있다. 라벨링 저작 도구에 대한 상세한 설명은 도 6a에서 후술된다.Depending on the embodiment, the learning data generating device 100 may provide a labeling authoring tool to the worker terminal 400. The labeling authoring tool may include a user interface designed to efficiently perform the mapping task of sign language image data. Workers can use the labeling authoring tool through the worker terminal 400, and can perform labeling work more time-efficiently than existing authoring tools by generating tokens with a simple operation. A detailed description of the labeling authoring tool is provided below in FIG. 6A.

도 2는 도 1의 학습데이터 생성 장치를 나타내는 블록도이다. 도 1 및 도 2를 참조하면, 학습데이터 생성 장치(100)는 인터페이스(110), 센서(120), 데이터베이스(130), 통신 모듈(140), 메모리(150) 및 프로세서(170)를 포함할 수 있다. FIG. 2 is a block diagram showing the learning data generating device of FIG. 1. 1 and 2, the learning data generating device 100 may include an interface 110, a sensor 120, a database 130, a communication module 140, a memory 150, and a processor 170. You can.

인터페이스(110)는 수어문 번역 장치(200), 수어 영상 촬영 장치(300) 및 작업자 단말(400)과 데이터 또는 정보를 송수신하도록 구성될 수 있다. 예를 들어, 인터페이스(110)는 수어문 번역 장치(200)에 제1 텍스트 데이터를 송신하고, 제2 텍스트 데이터를 수신할 수 있다. 인터페이스(110)는 수어 영상 촬영 장치(300)에 제1 텍스트 데이터 및 제2 텍스트 데이터를 송신하고, 수어 영상 데이터를 수신할 수 있다. 인터페이스(110)는 작업자 단말(400)에 형태소 정보 및 수어 영상 데이터를 송신하고, 토큰 정보를 수신할 수 있다.The interface 110 may be configured to transmit and receive data or information with the sign language translation device 200, the sign language image capturing device 300, and the worker terminal 400. For example, the interface 110 may transmit first text data to the sign language translation device 200 and receive second text data. The interface 110 may transmit first text data and second text data to the sign language image capturing device 300 and receive sign language image data. The interface 110 may transmit morpheme information and sign language image data to the worker terminal 400 and receive token information.

센서(120)는 수어 영상 데이터를 생성하기 위해 카메라 및 동작인식장치를 포함할 수 있다. 예를 들어, 동작인식장치는 농인이 착용하는 것으로, 농인이 손에 착용한 후 수어를 구사하면 내장된 센서에서 손동작의 움직임을 인식하는 다수의 동작인식센서가 구성되어 해당 동작을 인식하도록 구성될 수 있다. 센서(120)는 도 1의 수어 영상 촬영 장치(300)에 대응되는 것으로 실시 예에 따라 생략될 수 있다. 이 경우, 외부의 수어 영상 촬영 장치(300)가 수어 영상 데이터를 생성할 수 있다.The sensor 120 may include a camera and a motion recognition device to generate sign language image data. For example, a motion recognition device is worn by a deaf person. When a deaf person wears it on their hand and uses sign language, a number of motion recognition sensors are configured to recognize the movement of the hand using the built-in sensor, and are configured to recognize the corresponding motion. You can. The sensor 120 corresponds to the sign language image capturing device 300 of FIG. 1 and may be omitted depending on the embodiment. In this case, the external sign language image capturing device 300 can generate sign language image data.

데이터베이스(130)는 프로세서(170)에 의해 생성될 수 있다. 데이터베이스(130)는 레이블된 학습데이터를 저장할 수 있다. 예를 들어, 데이터베이스(130)는 룩업 테이블의 형태로 학습데이터를 저장할 수 있고, 인공지능 모델의 트레이닝 시에 저장된 학습데이터를 제공할 수 있다.The database 130 may be created by the processor 170. The database 130 may store labeled learning data. For example, the database 130 may store learning data in the form of a lookup table and provide the stored learning data when training an artificial intelligence model.

데이터베이스(130)는 비휘발성 메모리로 구성될 수 있다. 비휘발성 메모리(Non-volatile memory)는 전원이 공급되지 않은 상태에서도 정보를 저장 및 유지하고, 전원이 공급되면 다시 저장된 정보를 사용할 수 있는 기억 매체를 의미한다. 비휘발성 메모리는 예를 들어, 플래시 메모리(flash memory), 하드디스크(hard disk), SSD(Solid State Drive), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 롬(Read Only Memory, ROM), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나를 포함할 수 있다.The database 130 may be comprised of non-volatile memory. Non-volatile memory refers to a storage medium that stores and maintains information even when power is not supplied, and can use the stored information again when power is supplied. Non-volatile memory is, for example, flash memory, hard disk, solid state drive (SSD), multimedia card micro type, card type memory (e.g. SD or It may include at least one of (XD memory, etc.), ROM (Read Only Memory, ROM), magnetic memory, magnetic disk, and optical disk.

통신 모듈(140)은 프로세서(170)의 제어에 따라 다른 전자 장치 또는 외부 장치와 통신을 수행할 수 있다. 통신 모듈(140)은 통신 인터페이스를 통해 무선 통신 또는 유선 통신을 통해서 네트워크에 연결 또는 장치 간 연결을 통해 통신할 수 있다. 무선 통신은, 예를 들어, Wifi(wireless fidelity), BT(bluetooth), NFC(near field communication), GPS(global positioning system) 또는 셀룰러(cellular) 통신(예를 들어, LTE, LTE-A, CDMA, WCDMA, UMTS, WiBro 또는 GSM 중) 중 적어도 하나를 포함할 수 있다. 유선 통신은, 예를 들어, USB(universal serial bus), HDMI(high definition multimedia interface), RS-232(recommended standard 232) 또는 POTS(plain old telephone service) 중 적어도 하나를 포함할 수 있다. The communication module 140 may communicate with other electronic devices or external devices under the control of the processor 170. The communication module 140 may connect to a network or communicate between devices through wireless or wired communication through a communication interface. Wireless communications include, for example, Wifi (wireless fidelity), BT (bluetooth), NFC (near field communication), GPS (global positioning system), or cellular communications (e.g., LTE, LTE-A, CDMA). , WCDMA, UMTS, WiBro, or GSM). Wired communication may include, for example, at least one of universal serial bus (USB), high definition multimedia interface (HDMI), recommended standard 232 (RS-232), or plain old telephone service (POTS).

메모리(150)는 하나 이상의 명령어들(instructions)을 포함하는 프로그램을 저장할 수 있다. 메모리(150)는 프로세서(170)가 판독할 수 있는 명령어들, 알고리즘(algorithm), 데이터 구조, 프로그램 코드(program code), 및 애플리케이션 프로그램(application program) 중 적어도 하나가 저장할 수 있다. 메모리(150)에 저장되는 명령어들, 알고리즘, 데이터 구조, 및 프로그램 코드는 예를 들어, C, C++, 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다.Memory 150 may store a program including one or more instructions. The memory 150 may store at least one of instructions, algorithms, data structures, program codes, and application programs that the processor 170 can read. Instructions, algorithms, data structures, and program codes stored in memory 150 may be implemented in, for example, programming or scripting languages such as C, C++, Java, assembler, etc.

예를 들어, 메모리(150)는 플래시 메모리 타입(flash memory type), 램(RAM, Random Access Memory), SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), 또는 EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory) 중 적어도 하나의 타입의 하드웨어 장치를 포함할 수 있다.For example, the memory 150 may be a flash memory type, RAM (Random Access Memory), SRAM (Static Random Access Memory), ROM (Read-Only Memory), or EEPROM (Electrically Erasable Memory). It may include at least one type of hardware device among Programmable Read-Only Memory (Programmable Read-Only Memory) and PROM (Programmable Read-Only Memory).

프로세서(170)는 인터페이스(110), 센서(120), 데이터베이스(130), 통신 모듈(140) 및 메모리(150)와 전기적으로 연결되어, 각 구성들을 제어하도록 구성될 수 있다. 실시 예에 따라, 프로세서(170)는 중앙처리장치(CPU), 어플리케이션 프로세서(AP), 또는 커뮤니케이션 프로세서(CP, communication processor) 중 하나 또는 그 이상을 포함할 수 있다. 프로세서(170)는, 예를 들면, 학습데이터 생성 장치(100)의 적어도 하나의 다른 구성요소들의 제어 및/또는 통신에 관한 연산이나 데이터 처리를 실행할 수 있다. 프로세서(170)의 처리(또는 제어) 동작은 도 3을 참조하여 구체적으로 설명된다.The processor 170 may be electrically connected to the interface 110, sensor 120, database 130, communication module 140, and memory 150 to control each component. Depending on the embodiment, the processor 170 may include one or more of a central processing unit (CPU), an application processor (AP), or a communication processor (CP). For example, the processor 170 may perform operations or data processing related to control and/or communication of at least one other component of the learning data generating device 100. The processing (or control) operation of the processor 170 is described in detail with reference to FIG. 3.

도 3은 도 2의 프로세서를 나타내는 블록도이다. 프로세서(170)는, 예를 들면, 운영 체제 또는 어플리케이션 프로그램을 구동하여 프로세서(170)에 연결된 다수의 하드웨어 또는 소프트웨어 구성요소들을 제어할 수 있고, 각종 데이터 처리 및 연산을 수행할 수 있다. 프로세서(170)는, 예를 들면, SoC(system on chip)로 구현될 수 있다. 실시 예에 따라, 프로세서(170)는 GPU(graphic processing unit) 및/또는 이미지 시그널 프로세서(ISP, image signal processor)를 더 포함할 수 있다. FIG. 3 is a block diagram showing the processor of FIG. 2. The processor 170 can control a number of hardware or software components connected to the processor 170 by, for example, running an operating system or application program, and can perform various data processing and calculations. The processor 170 may be implemented, for example, as a system on chip (SoC). Depending on the embodiment, the processor 170 may further include a graphic processing unit (GPU) and/or an image signal processor (ISP).

도 1 내지 도 3을 참조하면, 프로세서(170)는 형태소 분석부(171), 토큰 생성부(172) 및 검증부(173)를 포함할 수 있다.Referring to FIGS. 1 to 3 , the processor 170 may include a morpheme analysis unit 171, a token generation unit 172, and a verification unit 173.

형태소 분석부(171)는 제2 텍스트 데이터를 기반으로 형태소 분석을 통해 의미 요소를 추출하도록 구성될 수 있다. 의미 요소는 수어의 최소 의미 단위로 구성될 수 있다. 예를 들어, “나는 튼튼한 집을 샀다”의 문장에서, “나+집+사다, 집+강하다”로 수어문 번역이 된다면, “나”, “집”, “사다”, “강하다” 4개의 의미 요소들이 추출될 수 있다. 이러한 의미 요소들의 정보는 형태소 정보로 명명될 수 있다. 형태소 분석부(171)는 형태소 정보를 작업자 단말(400)에 제공할 수 있다.The morpheme analysis unit 171 may be configured to extract semantic elements through morpheme analysis based on the second text data. Semantic elements can be composed of the minimum semantic unit of a sign language. For example, in the sentence “I bought a strong house,” if the sign language is translated as “I + house + buy, house + strong,” the four meanings are “me,” “house,” “buy,” and “strong.” Elements can be extracted. Information on these semantic elements can be named morpheme information. The morpheme analysis unit 171 may provide morpheme information to the worker terminal 400.

실시 예에 따라, 형태소 분석부(171)는 Bidirectional LSTM(Long Short-Term Memory) 방식의 형태소 분석기를 활용하여 한국어 문장상의 구문을 분석하여 의미를 이해하고 그에 맞는 최적의 단어를 선택하도록 설계할 수 있다. 이러한 형태소 분석기에서는 단어를 구성하고 있는 각 형태소의 기본형을 인식할 수 있으며 수화통역에 필요 없는 조사 등은 제거할 수 있다.Depending on the embodiment, the morpheme analysis unit 171 may be designed to analyze the syntax of Korean sentences using a bidirectional LSTM (Long Short-Term Memory) type morpheme analyzer to understand the meaning and select the optimal word accordingly. there is. This morpheme analyzer can recognize the basic form of each morpheme that makes up a word and can remove postpositions that are unnecessary for sign language interpretation.

토큰 생성부(172)는 작업자 단말(400)로부터 수신된 맵핑 정보를 기반으로 토큰을 생성하도록 구성될 수 있다. 예를 들어, 맵핑 정보는 타임라인을 기반으로 하나 이상의 의미 요소들이 수어 영상 데이터에 매칭된 정보를 포함할 수 있다. 이러한 매칭 작업은 라벨링 저작 도구를 통해 수행될 수 있다. 토큰 생성부(172)는 작업자 단말(400)의 매칭 작업이 완료되면, 형태소 정보 및 타임라인 정보를 포함하는 토큰 정보를 생성할 수 있다. The token generator 172 may be configured to generate a token based on mapping information received from the worker terminal 400. For example, mapping information may include information in which one or more semantic elements are matched to sign language image data based on a timeline. This matching task can be performed through a labeling authoring tool. When the matching task of the worker terminal 400 is completed, the token generator 172 may generate token information including morpheme information and timeline information.

검증부(173)는 토큰 정보의 정확도에 대한 검증을 수행하도록 구성될 수 있다. 예를 들어, 검증부(173)는 토큰 정보를 기반으로 하나 이상의 의미 요소들을 테스트 영상 데이터로 변환하고, 테스트 영상 데이터를 수어 영상 데이터와 비교할 수 있다. 검증부(173)는 유사 판단 알고리즘을 이용하여 테스트 영상 데이터 및 수어 영상 데이터의 유사 여부를 검증하여 토큰 정보의 정확도를 판단할 수 있다. 예를 들어, 검증부(173)는 테스트 영상 데이터 및 수어 영상 데이터의 유사도가 일정 값 이상인 경우, 토큰 정보의 정확도를 기준 값 이상으로 판단할 수 있고, 검증을 완료할 수 있다.The verification unit 173 may be configured to verify the accuracy of token information. For example, the verification unit 173 may convert one or more semantic elements into test image data based on token information and compare the test image data with sign language image data. The verification unit 173 can determine the accuracy of token information by verifying whether the test image data and the sign language image data are similar using a similarity judgment algorithm. For example, if the similarity between the test image data and the sign language image data is above a certain value, the verification unit 173 may determine the accuracy of the token information to be above the reference value and complete the verification.

도 4는 본 개시의 일 실시 예에 따른 학습데이터 생성 장치의 동작 방법을 나타내는 순서도이다. 도 4를 참조하면, 학습데이터 생성 장치(100)의 동작 방법 (S100)은 레이블된 학습데이터를 생성하기 위해 S110 단계 내지 S160 단계를 포함할 수 있다.Figure 4 is a flowchart showing a method of operating a learning data generating device according to an embodiment of the present disclosure. Referring to FIG. 4, the operating method (S100) of the learning data generating apparatus 100 may include steps S110 to S160 to generate labeled learning data.

S110 단계에서, 학습데이터 생성 장치(100)는 국어문을 수신할 수 있다. 국어문은 학습데이터 생성 장치(100)의 사용자로부터 UI를 통해 지정 혹은 입력되는 문장일 수 있으며, 학습데이터 생성 장치(100)의 방송신호 수신기에 탑재되어 운영될 경우 수신된 방송신호에서 분리되어 디코딩 처리된 자막 방송의 문장일 수 있으며, 통신망을 통해 연동 가능한 타 컴퓨터 혹은 서버 시스템에서 전송되는 번역 요청 문장일 수 있다. 학습데이터 생성 장치(100)는 국어문을 제1 텍스트 데이터로서 메모리(150)에 저장할 수 있다.In step S110, the learning data generating device 100 may receive a Korean sentence. The Korean sentence may be a sentence specified or input through the UI by the user of the learning data generating device 100, and when mounted and operated in the broadcasting signal receiver of the learning data generating device 100, it is separated from the received broadcasting signal and decoded. It may be a sentence from a processed subtitle broadcast, or it may be a translation request sentence transmitted from another computer or server system that can be linked through a communication network. The learning data generating device 100 may store a Korean sentence as first text data in the memory 150.

S120 단계에서, 학습데이터 생성 장치(100)는 국어문에 대한 수어문을 요청할 수 있다. 예를 들어, 학습데이터 생성 장치(100)는 수어문 번역 장치(200)에 국어문에 대한 수어문을 요청할 수 있다. 학습데이터 생성 장치(100)는 번역된 수어문을 제2 텍스트 데이터로서 수신하고, 메모리(150)에 저장할 수 있다.In step S120, the learning data generating device 100 may request a sign language sentence for a Korean sentence. For example, the learning data generating device 100 may request a sign language sentence for a Korean sentence from the sign language translation device 200. The learning data generating device 100 may receive the translated sign language sentence as second text data and store it in the memory 150.

S130 단계에서, 학습데이터 생성 장치(100)는 수어문에 대한 형태소 분석을 수행할 수 있다. 예를 들어, 학습데이터 생성 장치(100)는 형태소 분석을 통해 제2 텍스트 데이터로부터 하나 이상의 의미 요소들을 추출할 수 있다. 의미 요소란 수어의 최소 의미 단위로 정의될 수 있다. 수어문은 하나 이상의 의미 요소들로 구성될 수 있고, 학습데이터 생성 장치(100)는 제2 텍스트 데이터를 하나 이상의 의미 요소들로 분리할 수 있다.In step S130, the learning data generating device 100 may perform morphological analysis on the sign language sentence. For example, the learning data generating device 100 may extract one or more semantic elements from the second text data through morphological analysis. A semantic element can be defined as the minimum semantic unit of a sign language. A sign sentence may be composed of one or more semantic elements, and the learning data generating device 100 may separate the second text data into one or more semantic elements.

S140 단계에서, 학습데이터 생성 장치(100)는 국어문에 대한 수어 영상을 획득할 수 있다. 예를 들어, 학습데이터 생성 장치(100)는 수어 영상 촬영 장치(300)에 국어문에 대한 수어 영상 촬영을 요청할 수 있다. 이 경우, 학습데이터 생성 장치(100)는 제1 텍스트 데이터를 수어 영상 촬영 장치(300)에 제공할 수 있는데, 실시 예에 따라, 제2 텍스트 데이터를 함께 제공할 수 있다. 수어 영상 촬영 장치(300)는 제1 텍스트 데이터 및 제2 텍스트 데이터를 기반으로 수어 영상을 촬영해 수어 영상 데이터를 학습데이터 생성 장치(100)에 제공할 수 있다. 실시 예에 따라, 학습데이터 생성 장치(100)는 내부의 센서(120)를 통해 수어 영상 데이터를 획득할 수 있다.In step S140, the learning data generating device 100 may acquire a sign language image for a Korean sentence. For example, the learning data generating device 100 may request the sign language image capturing device 300 to capture a sign language image for a Korean sentence. In this case, the learning data generating device 100 may provide first text data to the sign language image capturing device 300, and, depending on the embodiment, may also provide second text data. The sign language image capturing device 300 may capture a sign language image based on the first text data and the second text data and provide the sign language image data to the learning data generating device 100. Depending on the embodiment, the learning data generating device 100 may acquire sign language image data through the internal sensor 120.

S150 단계에서, 학습데이터 생성 장치(100)는 학습데이터를 생성할 수 있다. 예를 들어, 학습데이터 생성 장치(100)는 작업자 단말(400)에 하나 이상의 의미 요소들에 대한 정보 및 수어 영상 데이터를 제공하고, 작업자 단말(400)로부터 라벨링 작업이 완료된 맵핑 정보를 수신할 수 있다. 실시 예에 따라, 맵핑 정보는 하나 이상의 의미 요소들이 수어 영상 데이터에 타임라인을 기반으로 매칭된 정보를 포함할 수 있다. 학습데이터 생성 장치(100)는 맵핑 정보를 기반으로 토큰을 생성하고, 토큰에 대한 검증을 통해 학습데이터를 생성할 수 있다. 이에 대한 상세한 설명은 도 5에서 후술된다.In step S150, the learning data generating device 100 may generate learning data. For example, the learning data generating device 100 may provide information about one or more semantic elements and sign language image data to the worker terminal 400, and receive mapping information on which the labeling task has been completed from the worker terminal 400. there is. Depending on the embodiment, the mapping information may include information in which one or more semantic elements are matched to sign language image data based on a timeline. The learning data generating device 100 may generate a token based on mapping information and generate learning data through verification of the token. A detailed description of this is provided later in FIG. 5 .

S160 단계에서, 학습데이터 생성 장치(100)는 데이터베이스를 생성할 수 있다. 예를 들어, 학습데이터 생성 장치(100)는 검증이 완료된 학습데이터를 레이블된 학습데이터로서 데이터베이스에 저장할 수 있다. 즉, 학습데이터 생성 장치(100)는 토큰 정보의 정확도가 기준 값 이상인 경우 입력이 제1 텍스트 데이터이고 정답이 하나 이상의 의미 요소들의 시퀀스인 레이블된 학습데이터를 저장하여 데이터베이스를 생성할 수 있다.In step S160, the learning data generating device 100 may create a database. For example, the learning data generating device 100 may store verified learning data as labeled learning data in a database. That is, when the accuracy of token information is greater than the reference value, the learning data generating device 100 can create a database by storing labeled learning data in which the input is first text data and the correct answer is a sequence of one or more semantic elements.

도 5는 도 4의 S150 단계를 설명하기 위한 도면이다. 도 4 및 도 5를 참조하면, 학습데이터 생성 장치(100)는 작업자 단말(400)과 통신하면서 학습데이터를 생성할 수 있다.Figure 5 is a diagram for explaining step S150 of Figure 4. Referring to FIGS. 4 and 5 , the learning data generating device 100 may generate learning data while communicating with the worker terminal 400.

S151 단계에서, 학습데이터 생성 장치(100)는 작업자 단말(400)에 수어 영상 데이터 및 형태소 정보를 전송할 수 있다. 수어 영상 데이터는 수어 영상 촬영 장치(300) 또는 센서(120)에 의해 촬영된 수화 영상을 포함할 수 있고, 형태소 정보는 형태소 분석을 통해 생성된 하나 이상의 의미 요소들에 대한 정보를 포함할 수 있다. In step S151, the learning data generating device 100 may transmit sign language image data and morpheme information to the worker terminal 400. The sign language image data may include a sign language image captured by the sign language image capturing device 300 or the sensor 120, and the morpheme information may include information about one or more semantic elements generated through morpheme analysis. .

실시 예에 따라, 작업자 단말(400)은 수어 영상 데이터 및 형태소 정보를 통해 맵핑 작업 또는 라벨링 작업을 수행할 수 있다(S410). 이 경우, 학습데이터 생성 장치(100)는 맵핑 작업 또는 라벨링 작업을 수행하기 위한 라벨링 저작 도구를 지원할 수 있는데, 라벨링 저작 도구는 UI를 통해 작업자 단말(400)에 노출될 수 있다.Depending on the embodiment, the worker terminal 400 may perform a mapping or labeling task through sign language image data and morpheme information (S410). In this case, the learning data generating device 100 may support a labeling authoring tool for performing a mapping operation or a labeling operation, and the labeling authoring tool may be exposed to the worker terminal 400 through a UI.

예를 들어, 도 6a을 참조하면, 라벨링 저작 도구는 제1 내지 제3 화면들(S1, S2, S3)을 포함할 수 있다. 제1 화면(S1)은 수어 영상 데이터를 기반으로 수어 영상이 재생되는 화면일 수 있다. 제2 화면(S2)은 작업 정보, 토큰 정보, 메모 등의 라벨링 저작 도구를 통해 생성되는 정보가 표시되는 화면일 수 있다. 예를 들어, 작업 정보는 작업명, 파일명, 총 시간, 담당자, 상태, 최종 업데이트일 정보를 포함할 수 있다. 예를 들어, 토큰 정보는 단어, 일치동사, 시간(재배치된 타임라인) 정보를 포함할 수 있다. 제3 화면(S3)은 수어 영상에 대한 타임라인 및 타임라인 기반으로 매칭된 의미 요소들이 표시되는 화면일 수 있다.For example, referring to FIG. 6A, the labeling authoring tool may include first to third screens S1, S2, and S3. The first screen (S1) may be a screen where a sign language image is played based on sign language image data. The second screen S2 may be a screen that displays information generated through a labeling authoring tool, such as task information, token information, and memos. For example, task information may include task name, file name, total time, person in charge, status, and last update date information. For example, token information may include word, congruent verb, and time (rearranged timeline) information. The third screen (S3) may be a screen that displays a timeline for the sign language image and semantic elements matched based on the timeline.

작업자는 작업자 단말(400)을 통해 라벨링 저작 도구를 실행할 수 있다. 작업자 단말(400)은 라벨링 저작 도구가 실행되면, 제1 화면(S1)에 수어 영상 데이터를 기반으로 수어 영상을 재생시키고, 제3 화면(S3)에 형태소 정보를 기반으로 하나 이상의 의미 요소들을 나열시킬 수 있다. 예를 들어, 제1 내지 제4 의미 요소들(E1, E2, E3, E4)은 제3 화면(S3)에 나열될 수 있다.The worker can run the labeling authoring tool through the worker terminal 400. When the labeling authoring tool is executed, the worker terminal 400 plays a sign language image based on sign language image data on the first screen (S1) and lists one or more semantic elements based on morpheme information on the third screen (S3). You can do it. For example, the first to fourth semantic elements E1, E2, E3, and E4 may be listed on the third screen S3.

제1 내지 제4 의미 요소들(E1, E2, E3, E4)은 시간 정보와 결합되어 토큰이 생성될 수 있다. 작업자는 작업자 단말(400)을 통해 제1 내지 제4 의미 요소들(E1, E2, E3, E4)에 시간 정보를 부가할 수 있고, 예를 들어, 마우스와 같은 입력 장치를 통해 제1 내지 제4 의미 요소들(E1, E2, E3, E4)을 대응되는 수어 영상의 재생 시간에 매칭시킬 수 있다.The first to fourth semantic elements (E1, E2, E3, E4) may be combined with time information to generate a token. The worker can add time information to the first to fourth semantic elements (E1, E2, E3, E4) through the worker terminal 400, and, for example, the first to fourth semantic elements (E1, E2, E3, E4) through an input device such as a mouse. 4 Semantic elements (E1, E2, E3, E4) can be matched to the playback time of the corresponding sign language video.

제3 화면(S3)을 참조하면, 작업자 단말(400)은 작업자의 입력 신호를 기반으로 제1 내지 제4 의미 요소들(E1, E2, E3, E4)을 시간 순서에 따라 수어 영상에 매칭시킬 수 있다. 예를 들어, 제1 의미 요소(E1)는 수어 영상의 제1 구간(t11~t12)에 매칭될 수 있고, 제2 의미 요소(E2)는 수어 영상의 제2 구간(t21~t22)에 매칭될 수 있고, 제3 의미 요소(E3)는 수어 영상의 제3 구간(t31~t32)에 매칭될 수 있고, 제4 의미 요소(E4)는 수어 영상의 제4 구간(t41~t42)에 매칭될 수 있다. 작업자는 제1 내지 제4 구간을 미세하게 조절할 수 있고, 그 결과 학습데이터 생성 장치(100)는 수어 영상에 매칭된 타임라인 정보를 획득할 수 있다.Referring to the third screen (S3), the worker terminal 400 matches the first to fourth semantic elements (E1, E2, E3, E4) to the sign language image in chronological order based on the worker's input signal. You can. For example, the first semantic element (E1) may match the first section (t11 to t12) of the signed language image, and the second semantic element (E2) may match the second section (t21 to t22) of the signed language image. The third semantic element (E3) may match the third section (t31 to t32) of the signed language image, and the fourth semantic element (E4) may match the fourth section (t41 to t42) of the signed language image. It can be. The operator can finely adjust the first to fourth sections, and as a result, the learning data generating device 100 can obtain timeline information matched to the sign language image.

도 6b는 도 6a에 따른 라벨링 저작 도구를 실행한 작업자 단말(400)의 예시이다. 도 6b를 참조하면, 학습데이터 생성 장치(100)는 “반품할 상품이 있는데 기사님이 방문하시는 거 말고 직접 보내도 되나요?”라는 국어문에 대해 수어 영상 및 복수의 의미 요소들(“반품”, “물건”, “있다”, “그런데”, “사람”, “오다”, “말다(중단)”, “직접”, “보내다”, “가능”, “?”)을 획득할 수 있다. 작업자가 작업자 단말(400)을 통해 라벨링 저작 도구를 실행하면, 수어 영상이 재생되고, 복수의 의미 요소들이 시간에 따라 자동으로 배치될 수 있다(예를 들어, 글로스 우세 항목으로 배치). 작업자는 복수의 의미 요소들의 타임라인을 재배치함으로써, 형태소 정보와 시간 정보가 결합된 토큰 정보를 생성할 수 있다. 실시 예에 따라, 작업자 단말(400)은 작업자의 입력 신호를 기반으로 글로스 비우세 데이터 및 비수지 데이터를 생성할 수 있다.Figure 6b is an example of the worker terminal 400 executing the labeling authoring tool according to Figure 6a. Referring to Figure 6b, the learning data generating device 100 generates a sign language image and multiple semantic elements (“return”, “Thing,” “there,” “but,” “person,” “come,” “stop,” “direct,” “send,” “possible,” “?”) can be obtained. When a worker executes a labeling authoring tool through the worker terminal 400, a sign language image is played, and a plurality of semantic elements can be automatically arranged over time (for example, arranged as a gloss-dominant item). The operator can generate token information combining morpheme information and time information by rearranging the timeline of a plurality of semantic elements. Depending on the embodiment, the worker terminal 400 may generate gloss non-dominant data and non-dominant data based on the worker's input signal.

즉, 학습데이터 생성 장치(100)는 작업자 단말(400)을 통해 라벨링 저작 도구를 제공할 수 있고, 작업자는 라벨링 저작 도구를 사용함으로써 시간 효율적으로 라벨링 작업을 수행할 수 있다. 구체적으로, 작업자가 라벨링 저작 도구를 실행하면, 수어 영상 및 형태소 정보가 자동으로 배치되며, 작업자는 마우스 등으로 형태소 정보를 수어 영상에 매칭되도록 타임라인에 재배시킴으로써, 토큰을 생성할 수 있다. 이로써, 토큰 생성을 위한 작업 시간이 기존 보다 2배 이상 단축될 수 있다. In other words, the learning data generating device 100 can provide a labeling authoring tool through the worker terminal 400, and the worker can perform labeling work in a time-efficient manner by using the labeling authoring tool. Specifically, when the worker runs the labeling authoring tool, the sign language image and morpheme information are automatically placed, and the worker can create a token by using a mouse, etc. to place the morpheme information on the timeline to match the sign language image. As a result, the work time for token creation can be reduced by more than two times compared to before.

다시 도 5를 참조하면, S152 단계에서, 학습데이터 생성 장치(100)는 작업자 단말(400)로부터 맵핑 정보를 수신할 수 있다. 실시 예에 따라, 맵핑 정보는 작업자 단말(400)이 도 6a의 라벨링 저작 도구를 활용해 타임라인 기반으로 수어 영상과 하나 이상의 의미 요소들을 매칭한 매칭 정보를 포함할 수 있다. Referring again to FIG. 5 , in step S152, the learning data generating device 100 may receive mapping information from the worker terminal 400. Depending on the embodiment, the mapping information may include matching information in which the worker terminal 400 matches a sign language image and one or more semantic elements based on a timeline using the labeling authoring tool of FIG. 6A.

S153 단계에서, 학습데이터 생성 장치(100)는 맵핑 정보를 기반으로 토큰을 생성할 수 있다. 토큰은 의미 요소에 대한 정보, 시간 정보, 및 수어 영상에 대한 정보가 연계된 학습데이터의 결과물일 수 있다. 생성된 토큰에 대한 정보는 토큰 정보로 명명될 수 있다.In step S153, the learning data generating device 100 may generate a token based on mapping information. Tokens may be the result of learning data in which information about semantic elements, time information, and information about sign language images are linked. Information about the generated token may be named token information.

S154 단계에서, 학습데이터 생성 장치(100)는 토큰 정보를 정확도를 판단하여 학습데이터를 검증할 수 있다. 예를 들어, 학습데이터 생성 장치(100)는 토큰 정보의 정확도가 기준 값 이상인 경우 검증을 완료하고, S155 단계에서, 학습데이터 생성을 완료할 수 있다. 생성 완료된 학습데이터는 레이블된 학습데이터일 수 있고, 예를 들어, 입력이 제1 텍스트 데이터이고 정답이 하나 이상의 의미 요소들의 시퀀스일 수 있다.In step S154, the learning data generating device 100 may verify the learning data by determining the accuracy of the token information. For example, the learning data generating device 100 may complete verification when the accuracy of the token information is greater than or equal to the reference value, and complete the learning data generation in step S155. The generated learning data may be labeled learning data. For example, the input may be first text data and the correct answer may be a sequence of one or more semantic elements.

실시 예에 따라, 토큰 정보의 정확도는 하나 이상의 의미 요소들을 테스트 영상 데이터로 변환하고, 테스트 영상 데이터를 수어 영상 데이터와 비교하여 판단될 수 있다. 예를 들어, 학습데이터 생성 장치(100)는 유사 판단 알고리즘을 이용하여 테스트 영상 데이터 및 수어 영상 데이터의 유사 여부를 검증하고, 테스트 영상 데이터 및 수어 영상 데이터의 유사도가 일정 값 이상인 경우, 토큰 정보의 정확도를 기준 값 이상으로 판단할 수 있다.Depending on the embodiment, the accuracy of token information may be determined by converting one or more semantic elements into test image data and comparing the test image data with sign language image data. For example, the learning data generating device 100 uses a similarity judgment algorithm to verify whether the test image data and the sign language image data are similar, and when the similarity between the test image data and the sign language image data is greater than a certain value, the token information Accuracy can be judged to be higher than the standard value.

도 7은 본 개시의 일 실시 예에 따른 학습데이터 생성 장치가 적용된 수어 번역 시스템을 보여주는 블록도이다. 도 7을 참조하면, 수어 번역 시스템(1000)은 학습데이터 생성 장치(1100), 수어 번역기(1200), 마이크(1300), STT(Sound To Text, STT) 변환 모듈(1400) 및 디스플레이(1500)를 포함할 수 있다. Figure 7 is a block diagram showing a sign language translation system to which a learning data generation device according to an embodiment of the present disclosure is applied. Referring to FIG. 7, the sign language translation system 1000 includes a learning data generating device 1100, a sign language translator 1200, a microphone 1300, a sound to text (STT) conversion module 1400, and a display 1500. may include.

학습데이터 생성 장치(1100)는 도 1 내지 도 6b를 통해 설명된 학습데이터 생성 장치가 적용될 수 있다. 따라서, 학습데이터 생성 장치(1100)는 레이블된 학습데이터를 생성할 수 있고, 수어 번역기(1200)에 제공할 수 있다. 수어 번역기(1200)는 뉴럴 네트워크(1250)를 포함할 수 있고, 예를 들어, 딥러닝 신경망 네트워크(DNN)는 레이블된 학습데이터를 트레이닝하여 국어문에 대한 수어 번역 연산을 수행할 수 있다.The learning data generating device 1100 described with reference to FIGS. 1 to 6B may be applied. Therefore, the learning data generating device 1100 can generate labeled learning data and provide it to the sign language translator 1200. The sign language translator 1200 may include a neural network 1250. For example, a deep learning neural network (DNN) may perform a sign language translation operation for a Korean sentence by training labeled learning data.

마이크(1300)는 사용자의 음성 아날로그 신호를 디지털 신호를 전환할 수 있고, STT 변환 모듈(1400)이 해당 디지털 신호를 텍스트 데이터로 변환할 수 있다. 예를 들어, STT 변환 모듈(1400)은 시퀀스 투 시퀀스 기반으로 음성을 문장으로 STT 변환하는 딥러닝 신경망 네트워크를 이용할 수 있다.The microphone 1300 can convert the user's voice analog signal into a digital signal, and the STT conversion module 1400 can convert the digital signal into text data. For example, the STT conversion module 1400 may use a deep learning neural network that converts voice to sentence into STT on a sequence-to-sequence basis.

수어 번역기(1200)는 뉴럴 네트워크(1250)를 통해 텍스트 데이터를 의미 요소들의 시퀀스로 번역하고, 의미 요소들의 시퀀스를 대화형 수어 영상과 융합하여 디스플레이(1500)에 제공할 수 있다. 디스플레이(1500)는 대화형 수어 영상을 사용자에게 표시할 수 있다.The sign language translator 1200 can translate text data into a sequence of semantic elements through the neural network 1250, fuse the sequence of semantic elements with an interactive sign language image, and provide it to the display 1500. The display 1500 can display an interactive sign language image to the user.

상술된 바와 같이, 본 발명에 따른 학습데이터 생성 장치(100)는 레이블된 학습데이터를 생성함으로써 수어를 사용할 수 없는 사람들과 농인들의 대화를 가능하게 하여, 수어 번역 시스템(1000)에 적용될 수 있다. 뿐만 아니라, 학습데이터 생성 장치(100)는 키오스크(청각 장애인의 키오스크 이용 시 수어 애니메이션 표출), 시설물 안내센터, 일기 예보 안내, 영상 지도 서비스, 수어 어플리케이션 서비스 등에 활용될 수 있고, 나아가 한국어-수어 번역기술과 수어 인식 기술의 발전에 따라, IoT, 수화통역, 관광 등 산업 분야에서 인공지능 알고리즘 및 서비스를 개발할 수 있는 인공지능 지식 생태계가 구축될 것으로 기대될 수 있다.As described above, the learning data generation device 100 according to the present invention enables conversation between people who cannot use sign language and the deaf by generating labeled learning data, and can be applied to the sign language translation system 1000. In addition, the learning data generation device 100 can be used for kiosks (displaying sign language animation when hearing-impaired people use the kiosk), facility information centers, weather forecast information, video map services, sign language application services, etc., and furthermore, Korean-sign language translation. With the advancement of technology and sign language recognition technology, it can be expected that an artificial intelligence knowledge ecosystem will be established that can develop artificial intelligence algorithms and services in industrial fields such as IoT, sign language interpretation, and tourism.

상술된 내용은 본 발명을 실시하기 위한 구체적인 실시 예들이다. 본 발명은 상술된 실시 예들뿐만 아니라, 단순하게 설계 변경되거나 용이하게 변경할 수 있는 실시 예들 또한 포함될 것이다. 또한, 본 발명은 실시 예들을 이용하여 용이하게 변형하여 실시할 수 있는 기술들도 포함될 것이다. 따라서, 본 발명의 범위는 상술된 실시 예들에 국한되어 정해져서는 안 되며 후술하는 특허청구범위뿐만 아니라 이 발명의 특허청구범위와 균등한 것들에 의해 정해져야 할 것이다.The above-described details are specific embodiments for carrying out the present invention. The present invention will include not only the above-described embodiments, but also embodiments that can be simply changed or easily changed in design. In addition, the present invention will also include technologies that can be easily modified and implemented using the embodiments. Therefore, the scope of the present invention should not be limited to the above-described embodiments, but should be determined by the claims and equivalents of the present invention as well as the claims described later.

10: 수어 학습 시스템
100: 학습데이터 생성 장치
200: 수어문 번역 장치
300: 수어 영상 촬영 장치
400: 작업자 단말10: Sign language learning system
100: Learning data generation device
200: Sign language translation device
300: Sign language video recording device
400: worker terminal

Claims

In the method of operating a learning data generation device that generates learning data for artificial intelligence-based sign language translation,
Receiving first text data consisting of Korean sentences based on a user input signal;
requesting a sign language translation device to translate the first text data into second text data composed of a sign language sentence formed in a grammatical system different from the Korean sentence;
Upon receiving the second text data from the sign language translation device, obtaining one or more semantic elements from the second text data;
Obtaining the sign language image data for the first text data by requesting the sign language image data for the first text data from a photographing device;
generating learning data based on the one or more semantic elements and the sign language image data; and
Including the step of storing the learning data,
The steps for generating the learning data are:
Transmitting the one or more semantic elements and the sign language image data to a worker terminal;
automatically arranging the one or more semantic elements and the sign language image data over time by providing a labeling authoring tool;
Receiving mapping information in which the one or more semantic elements are rearranged in the sign language image data based on a timeline from the worker terminal through the labeling authoring tool; and
A method of operating a learning data generating device further comprising generating token information based on the mapping information.

According to claim 1,
The obtaining of the one or more semantic elements further includes the step of separating the second text data into the one or more semantic elements that are the minimum semantic units of a sign language through morphological analysis of the second text data. How the device works.

According to claim 1,
The steps for acquiring the sign language image data are:
A method of operating a learning data generating device further comprising receiving the sign language image data captured based on the first text data and the second text data from a photographing device including a camera.

delete

According to claim 1,
The step of storing the learning data further includes the step of creating a learning database based on the learning data in which the input is the first text data and the correct answer is a sequence of the one or more semantic elements when the accuracy of the token information is greater than the reference value. A method of operating a learning data generation device including:

An interface configured to transmit and receive data with external devices;
A sensor configured to acquire sign language image data; and
Includes a processor,
The processor:
Receive first text data consisting of Korean sentences based on the user input signal,
Requesting a sign language translation device among the external devices to translate the first text data into second text data consisting of a sign language sentence formed in a grammar system different from the Korean sentence,
Upon receiving the second text data from the sign language translation device, obtain one or more semantic elements from the second text data,
Requesting the sign language image data for the first text data from the sensor and receiving the sign language image data,
Generate learning data based on the one or more semantic elements and the sign language image data, and
It is configured to create a database by storing the learning data,
The processor:
Transmitting the one or more semantic elements and the sign language image data to the worker terminal,
By providing a labeling authoring tool, automatically arranging the one or more semantic elements and the sign language image data over time,
Receiving mapping information in which the one or more semantic elements are rearranged based on a timeline in the sign language image data from the worker terminal through the labeling authoring tool, and
A learning data generating device further configured to generate token information based on the mapping information.