KR102636708B1

KR102636708B1 - Electronic terminal apparatus which is able to produce a sign language presentation video for a presentation document, and the operating method thereof

Info

Publication number: KR102636708B1
Application number: KR1020220116975A
Authority: KR
Inventors: 김미향
Original assignee: 주식회사 한글과컴퓨터
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2024-02-14

Abstract

본 발명은 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 전자 단말 장치 및 그 동작 방법을 제시함으로써, 상기 수어 발표 영상을 통해, 청각 장애인이 상기 프레젠테이션 문서의 내용을 보다 쉽게 이해할 수 있도록 지원할 수 있다.The present invention provides an electronic terminal device capable of producing a sign language presentation video for a presentation document and a method of operating the same, thereby supporting hearing-impaired people to more easily understand the contents of the presentation document through the sign language presentation video.

Description

Electronic terminal device capable of producing sign language presentation video for presentation document and its operating method {ELECTRONIC TERMINAL APPARATUS WHICH IS ABLE TO PRODUCE A SIGN LANGUAGE PRESENTATION VIDEO FOR A PRESENTATION DOCUMENT, AND THE OPERATING METHOD THEREOF}

본 발명은 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 전자 단말 장치 및 그 동작 방법에 대한 것이다.The present invention relates to an electronic terminal device capable of producing a sign language presentation video for a presentation document and a method of operating the same.

최근, 컴퓨터나 스마트폰 또는 태블릿 PC 등이 널리 보급됨에 따라, 이러한 전자 단말 장치를 이용하여 전자 문서를 열람, 작성, 편집할 수 있도록 하는 다양한 종류의 전자 문서 관련 프로그램들이 출시되고 있다. 이러한 전자 문서 관련 프로그램들로는 기본적인 문서의 작성, 편집 등을 지원하는 워드프로세서, 데이터의 입력, 산술연산, 데이터 관리를 보조하는 스프레드시트, 발표자의 발표를 보조하기 위한 프레젠테이션 프로그램들이 있다.Recently, as computers, smartphones, and tablet PCs have become widely available, various types of electronic document-related programs that allow users to view, create, and edit electronic documents using these electronic terminal devices have been released. These electronic document-related programs include word processors that support the creation and editing of basic documents, spreadsheets that assist in data input, arithmetic operations, and data management, and presentation programs that assist presenters in their presentations.

특히, 프레젠테이션 프로그램은 발표에 활용하기 위한 문서를 작성할 때 주로 사용되기 때문에, 사용자는 프레젠테이션 문서를 구성하는 슬라이드의 내용을 독자들이 쉽게 이해할 수 있도록 하기 위해, 슬라이드에 대한 발표 녹화 영상을 제작하는 경우가 많다.In particular, since presentation programs are mainly used when creating documents for use in presentations, users sometimes create presentation recordings of the slides to help readers easily understand the contents of the slides that make up the presentation document. many.

이렇게, 슬라이드에 대한 발표 녹화 영상은 독자들이 해당 슬라이드의 내용을 직관적으로 이해할 수 있도록 도와준다는 점에서 매우 유용하게 활용될 수 있다.In this way, recorded presentation videos of slides can be very useful in that they help readers intuitively understand the content of the slide.

하지만, 이러한 녹화 영상은 주로 음성으로 구성되기 때문에, 청각 장애인의 경우에는, 녹화 영상만으로는 슬라이드의 내용을 이해하기 어렵다는 문제점이 있다.However, since these recorded videos mainly consist of audio, there is a problem in that it is difficult for hearing impaired people to understand the contents of the slides using only the recorded videos.

만약, 슬라이드에 대한 녹화 영상으로부터 음성을 추출하여 추출된 음성을 텍스트로 변환하고, 상기 텍스트를 기초로, 상기 슬라이드에 대한 수어 발표 영상을 제작할 수 있는 기술이 도입된다면, 이러한 수어 발표 영상을 통해, 청각 장애인도 상기 슬라이드의 내용을 쉽게 이해할 수 있도록 지원할 수 있을 것이다.If technology is introduced to extract voice from a recorded video of a slide, convert the extracted voice into text, and produce a sign language presentation video for the slide based on the text, through this sign language presentation video, Even hearing-impaired users will be able to easily understand the contents of the slides.

따라서, 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 기술에 대한 연구가 필요하다.Therefore, research is needed on technology that can produce sign language presentation videos for presentation documents.

본 발명은 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 전자 단말 장치 및 그 동작 방법을 제시함으로써, 상기 수어 발표 영상을 통해, 청각 장애인이 상기 프레젠테이션 문서의 내용을 보다 쉽게 이해할 수 있도록 지원하고자 한다.The present invention proposes an electronic terminal device capable of producing a sign language presentation video for a presentation document and a method of operating the same, thereby supporting hearing-impaired people to more easily understand the contents of the presentation document through the sign language presentation video.

본 발명의 일실시예에 따른 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 전자 단말 장치는 사용자에 의해, 프레젠테이션 문서에 대한 발표 영상의 녹화 명령이 인가되면, 상기 전자 단말 장치에 탑재된 마이크로폰을 통해, 상기 프레젠테이션 문서를 구성하는 복수의 슬라이드들 각각에 대한 상기 사용자의 발표 음성을 입력받아, 상기 발표 음성을 기초로, 상기 복수의 슬라이드들 각각에 대한 녹화 영상을 생성하는 녹화 영상 생성부, 상기 사용자에 의해, 상기 복수의 슬라이드들 중 어느 하나인 제1 슬라이드에 대한 수어 영상의 추가 생성 명령이 인가되면, 상기 제1 슬라이드에 대한 제1 녹화 영상으로부터 음성을 추출하여, 사전 설정된 STT(Speech To Text) 모델을 기초로, 상기 추출된 음성에 대한 음성 인식을 수행함으로써, 상기 추출된 음성을 텍스트로 변환하는 텍스트 변환부, 상기 추출된 음성이 상기 텍스트로 변환되면, 상기 텍스트를 수어(手語)로 표현하는 동작을 갖는 제1 아바타 영상을 생성하는 아바타 영상 생성부, 상기 제1 아바타 영상이 생성되면, 상기 제1 아바타 영상을 상기 제1 녹화 영상의 사전 설정된 삽입 위치에 삽입하여 하나의 영상으로 합성함으로써, 수어 영상을 생성하는 수어 영상 생성부 및 상기 복수의 슬라이드들 각각에, 각 슬라이드에 대한 녹화 영상을 개체로 삽입하되, 상기 제1 슬라이드에는 상기 수어 영상을 개체로 추가 삽입하는 영상 삽입부를 포함한다.An electronic terminal device capable of producing a sign language presentation video for a presentation document according to an embodiment of the present invention uses a microphone mounted on the electronic terminal device when a command to record a presentation video for a presentation document is issued by a user. , a recorded video generator that receives the user's presentation voice for each of the plurality of slides constituting the presentation document and generates a recorded video for each of the plurality of slides based on the presentation voice, the user When a command to add a sign language video for the first slide, which is one of the plurality of slides, is applied, voice is extracted from the first recorded video for the first slide, and a preset STT (Speech To Text) is applied. ) A text conversion unit that converts the extracted voice into text by performing voice recognition on the extracted voice based on the model, and when the extracted voice is converted into text, the text is converted into sign language. An avatar image generator that generates a first avatar image having an expressive motion, and when the first avatar image is generated, inserts the first avatar image into a preset insertion position of the first recorded image and synthesizes it into one image. By doing so, it includes a sign language image generating unit that generates a sign language image, and a video insertion unit that inserts the recorded video for each slide as an object into each of the plurality of slides, and additionally inserts the sign language image as an object in the first slide. do.

또한, 본 발명의 일실시예에 따른 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 전자 단말 장치의 동작 방법은 사용자에 의해, 프레젠테이션 문서에 대한 발표 영상의 녹화 명령이 인가되면, 상기 전자 단말 장치에 탑재된 마이크로폰을 통해, 상기 프레젠테이션 문서를 구성하는 복수의 슬라이드들 각각에 대한 상기 사용자의 발표 음성을 입력받아, 상기 발표 음성을 기초로, 상기 복수의 슬라이드들 각각에 대한 녹화 영상을 생성하는 단계, 상기 사용자에 의해, 상기 복수의 슬라이드들 중 어느 하나인 제1 슬라이드에 대한 수어 영상의 추가 생성 명령이 인가되면, 상기 제1 슬라이드에 대한 제1 녹화 영상으로부터 음성을 추출하여, 사전 설정된 STT 모델을 기초로, 상기 추출된 음성에 대한 음성 인식을 수행함으로써, 상기 추출된 음성을 텍스트로 변환하는 단계, 상기 추출된 음성이 상기 텍스트로 변환되면, 상기 텍스트를 수어로 표현하는 동작을 갖는 제1 아바타 영상을 생성하는 단계, 상기 제1 아바타 영상이 생성되면, 상기 제1 아바타 영상을 상기 제1 녹화 영상의 사전 설정된 삽입 위치에 삽입하여 하나의 영상으로 합성함으로써, 수어 영상을 생성하는 단계 및 상기 복수의 슬라이드들 각각에, 각 슬라이드에 대한 녹화 영상을 개체로 삽입하되, 상기 제1 슬라이드에는 상기 수어 영상을 개체로 추가 삽입하는 단계를 포함한다.In addition, a method of operating an electronic terminal device capable of producing a sign language presentation video for a presentation document according to an embodiment of the present invention involves, when a command to record a presentation video for a presentation document is given by a user, the electronic terminal device Receiving the user's presentation voice for each of the plurality of slides constituting the presentation document through a mounted microphone, and generating a recorded video for each of the plurality of slides based on the presentation voice, When a command to create an additional sign language video for the first slide, which is one of the plurality of slides, is issued by the user, audio is extracted from the first recorded video for the first slide, and a preset STT model is used. As a basis, converting the extracted voice into text by performing voice recognition on the extracted voice, and when the extracted voice is converted into text, a first avatar having an operation of expressing the text in sign language Generating an image, when the first avatar image is generated, inserting the first avatar image into a preset insertion position of the first recorded image and compositing it into one image, thereby generating a sign language image, and the plurality of In each of the slides, inserting the recorded video for each slide as an object, and additionally inserting the sign language video as an object in the first slide.

도 1은 본 발명의 일실시예에 따른 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 전자 단말 장치의 구조를 도시한 도면이다.
도 2 내지 도 3은 본 발명의 일실시예에 따른 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 전자 단말 장치의 동작을 설명하기 위한 도면이다.
도 4는 본 발명의 일실시예에 따른 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 전자 단말 장치의 동작 방법을 도시한 순서도이다.Figure 1 is a diagram showing the structure of an electronic terminal device capable of producing a sign language presentation video for a presentation document according to an embodiment of the present invention.
Figures 2 and 3 are diagrams for explaining the operation of an electronic terminal device capable of producing a sign language presentation video for a presentation document according to an embodiment of the present invention.
Figure 4 is a flowchart showing a method of operating an electronic terminal device capable of producing a sign language presentation video for a presentation document according to an embodiment of the present invention.

이하에서는 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명하기로 한다. 이러한 설명은 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였으며, 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 본 명세서 상에서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 사람에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the attached drawings. This description is not intended to limit the present invention to specific embodiments, but should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention. In describing each drawing, similar reference numerals are used for similar components, and unless otherwise defined, all terms used in this specification, including technical or scientific terms, are within the scope of common knowledge in the technical field to which the present invention pertains. It has the same meaning as generally understood by those who have it.

본 문서에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다. 또한, 본 발명의 다양한 실시예들에 있어서, 각 구성요소들, 기능 블록들 또는 수단들은 하나 또는 그 이상의 하부 구성요소로 구성될 수 있고, 각 구성요소들이 수행하는 전기, 전자, 기계적 기능들은 전자회로, 집적회로, ASIC(Application Specific Integrated Circuit) 등 공지된 다양한 소자들 또는 기계적 요소들로 구현될 수 있으며, 각각 별개로 구현되거나 2 이상이 하나로 통합되어 구현될 수도 있다. In this document, when a part "includes" a certain component, this means that it may further include other components rather than excluding other components, unless specifically stated to the contrary. Additionally, in various embodiments of the present invention, each component, functional block, or means may be composed of one or more subcomponents, and the electrical, electronic, and mechanical functions performed by each component may be electronic. It may be implemented with various known elements or mechanical elements such as circuits, integrated circuits, and ASICs (Application Specific Integrated Circuits), and may be implemented separately or by integrating two or more into one.

한편, 첨부된 블록도의 블록들이나 흐름도의 단계들은 범용 컴퓨터, 특수용 컴퓨터, 휴대용 노트북 컴퓨터, 네트워크 컴퓨터 등 데이터 프로세싱이 가능한 장비의 프로세서나 메모리에 탑재되어 지정된 기능들을 수행하는 컴퓨터 프로그램 명령들(instructions)을 의미하는 것으로 해석될 수 있다. 이들 컴퓨터 프로그램 명령들은 컴퓨터 장치에 구비된 메모리 또는 컴퓨터에서 판독 가능한 메모리에 저장될 수 있기 때문에, 블록도의 블록들 또는 흐름도의 단계들에서 설명된 기능들은 이를 수행하는 명령 수단을 내포하는 제조물로 생산될 수도 있다. 아울러, 각 블록 또는 각 단계는 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 명령들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 가능한 실시예들에서는 블록들 또는 단계들에서 언급된 기능들이 정해진 순서와 달리 실행되는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 실질적으로 동시에 수행되거나, 역순으로 수행될 수 있으며, 경우에 따라 일부 블록들 또는 단계들이 생략된 채로 수행될 수도 있다.Meanwhile, the blocks in the attached block diagram or the steps in the flow chart are computer program instructions that are mounted on the processor or memory of equipment capable of data processing, such as general-purpose computers, special-purpose computers, portable laptop computers, and network computers, and perform designated functions. It can be interpreted to mean. Because these computer program instructions can be stored in a memory provided in a computer device or in a computer-readable memory, the functions described in the blocks of a block diagram or the steps of a flow diagram can be produced as a manufactured product containing instruction means to perform them. It could be. In addition, each block or each step may represent a module, segment, or portion of code that includes one or more executable instructions for executing specified logical function(s). Additionally, it should be noted that in some alternative embodiments, it is possible for functions mentioned in blocks or steps to be executed in a different order. For example, two blocks or steps shown in succession may be performed substantially simultaneously or in reverse order, and in some cases, some blocks or steps may be omitted.

도 1은 본 발명의 일실시예에 따른 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 전자 단말 장치의 구조를 도시한 도면이다.Figure 1 is a diagram showing the structure of an electronic terminal device capable of producing a sign language presentation video for a presentation document according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 전자 단말 장치(110)는 녹화 영상 생성부(111), 텍스트 변환부(112), 아바타 영상 생성부(113), 수어 영상 생성부(114) 및 영상 삽입부(115)를 포함한다.Referring to FIG. 1, the electronic terminal device 110 according to an embodiment of the present invention includes a recorded image generator 111, a text converter 112, an avatar image generator 113, and a sign language image generator 114. ) and an image insertion unit 115.

녹화 영상 생성부(111)는 사용자(130)에 의해, 프레젠테이션 문서에 대한 발표 영상의 녹화 명령이 인가되면, 전자 단말 장치(110)에 탑재된 마이크로폰을 통해, 상기 프레젠테이션 문서를 구성하는 복수의 슬라이드들 각각에 대한 사용자(130)의 발표 음성을 입력받아, 상기 발표 음성을 기초로, 상기 복수의 슬라이드들 각각에 대한 녹화 영상을 생성한다.When a command to record a presentation video for a presentation document is issued by the user 130, the recorded video generator 111 generates a plurality of slides constituting the presentation document through a microphone mounted on the electronic terminal device 110. The presentation voice of the user 130 for each of the slides is input, and a recorded video for each of the plurality of slides is generated based on the presentation voice.

텍스트 변환부(112)는 사용자(130)에 의해, 상기 복수의 슬라이드들 중 어느 하나인 제1 슬라이드에 대한 수어 영상의 추가 생성 명령이 인가되면, 상기 제1 슬라이드에 대한 제1 녹화 영상으로부터 음성을 추출하여, 사전 설정된 STT(Speech To Text) 모델을 기초로, 상기 추출된 음성에 대한 음성 인식을 수행함으로써, 상기 추출된 음성을 텍스트로 변환한다.When a command to create an additional sign language video for a first slide, which is one of the plurality of slides, is applied by the user 130, the text converter 112 generates an audio signal from the first recorded video for the first slide. and converts the extracted voice into text by performing voice recognition on the extracted voice based on a preset STT (Speech To Text) model.

아바타 영상 생성부(113)는 상기 추출된 음성이 상기 텍스트로 변환되면, 상기 텍스트를 수어(手語)로 표현하는 동작을 갖는 제1 아바타 영상을 생성한다.When the extracted voice is converted into text, the avatar image generator 113 generates a first avatar image that has the action of expressing the text in sign language.

이때, 본 발명의 일실시예에 따르면, 아바타 영상 생성부(113)는 상기 제1 아바타 영상을 생성하기 위한 구체적인 구성으로, 아바타 영상 클립 저장부(116), 단어 추출부(117), 단어 매칭부(118) 및 생성 처리부(119)를 포함할 수 있다.At this time, according to an embodiment of the present invention, the avatar image generator 113 is a specific configuration for generating the first avatar image, and includes an avatar image clip storage unit 116, a word extractor 117, and a word matching unit. It may include a unit 118 and a generation processing unit 119.

아바타 영상 클립 저장부(116)에는 수어 번역을 위한 사전 설정된 복수의 대표 단어들과, 상기 복수의 대표 단어들 각각을 수어로 표현하는 동작을 갖는 사전 제작된 아바타 영상 클립이 서로 대응되어 저장되어 있다.In the avatar video clip storage unit 116, a plurality of preset representative words for sign language translation and a pre-produced avatar video clip with an operation to express each of the plurality of representative words in sign language are stored in correspondence with each other. .

예컨대, 아바타 영상 클립 저장부(116)에는 하기의 표 1과 같이, 수어 번역을 위한 사전 설정된 복수의 대표 단어들과, 상기 복수의 대표 단어들 각각을 수어로 표현하는 동작을 갖는 사전 제작된 아바타 영상 클립이 서로 대응되어 저장되어 있을 수 있다.For example, the avatar video clip storage unit 116 includes a plurality of preset representative words for sign language translation, as shown in Table 1 below, and a pre-made avatar that has an operation to express each of the plurality of representative words in sign language. Video clips may be stored in correspondence with each other.

복수의 대표 단어들representative plural words 아바타 영상 클립Avatar video clip 단어 1word 1 아바타 영상 클립 1Avatar video clip 1 단어 2word 2 아바타 영상 클립 2Avatar video clip 2 단어 3word 3 아바타 영상 클립 3Avatar video clip 3 ...... ...... 단어 10word 10 아바타 영상 클립 10Avatar video clip 10

단어 추출부(117)는 텍스트 변환부(112)에 의해 상기 추출된 음성이 상기 텍스트로 변환되면, 상기 텍스트로부터, 상기 텍스트를 구성하는 적어도 하나의 단어를 추출한다.When the extracted voice is converted into text by the text conversion unit 112, the word extraction unit 117 extracts at least one word constituting the text from the text.

단어 매칭부(118)는 상기 적어도 하나의 단어가 추출되면, 상기 적어도 하나의 단어 각각에 대해, 상기 복수의 대표 단어들 중 각 단어와의 유사도가 최대인 유사 대표 단어를 하나씩 매칭한다.When the at least one word is extracted, the word matching unit 118 matches each of the at least one word with a similar representative word that has the maximum degree of similarity to each word among the plurality of representative words.

이때, 본 발명의 일실시예에 따르면, 단어 매칭부(118)는 상기 적어도 하나의 단어 각각에 대해, 상기 복수의 대표 단어들 중 각 단어와의 유사도가 최대인 유사 대표 단어를 하나씩 매칭하기 위한 구체적인 구성으로, 사전 데이터베이스(120), 유사도 연산부(121) 및 매칭 처리부(122)를 포함할 수 있다.At this time, according to one embodiment of the present invention, the word matching unit 118 is configured to match, for each of the at least one word, one by one similar representative words that have the maximum degree of similarity to each word among the plurality of representative words. As a specific configuration, it may include a dictionary database 120, a similarity calculation unit 121, and a matching processing unit 122.

사전 데이터베이스(120)에는 사전 설정된 복수의 단어들과, 상기 복수의 단어들 각각에 대응되는 사전 설정된 임베딩(embedding) 벡터(상기 복수의 단어들 각각에 대응되는 임베딩 벡터는, 단어들 간의 유사도에 기반하여 사전 설정된 벡터로서, 단어들 간의 의미가 서로 유사할수록 해당 단어들에 대응되는 임베딩 벡터 간의 벡터 유사도가 큰 값으로 산출되도록 설정되어 있음)가 저장되어 있다.The dictionary database 120 contains a plurality of preset words and a preset embedding vector corresponding to each of the plurality of words (the embedding vector corresponding to each of the plurality of words is based on the similarity between words). As a preset vector, the vector similarity between the embedding vectors corresponding to the words is calculated to be greater as the meaning between the words is similar to each other.) is stored.

여기서, 상기 벡터 유사도로, 하기의 수학식 1에 따른 코사인 유사도 또는 하기의 수학식 2에 따른 유클리드 거리(Euclidean Distance) 등이 사용될 수 있다.Here, as the vector similarity, cosine similarity according to Equation 1 below or Euclidean Distance according to Equation 2 below may be used.

상기 수학식 1에서, S는 벡터 A와 벡터 B 사이의 코사인 유사도로, -1에서 1 사이의 값을 가지며, A_i는 벡터 A의 i번째 성분, B_i는 벡터 B의 i번째 성분을 의미한다. 관련해서, 두 벡터 사이의 코사인 유사도가 클수록 두 벡터가 서로 유사한 벡터라고 볼 수 있다.In Equation 1 above, S is the cosine similarity between vector A and vector B and has a value between -1 and 1, A _i refers to the ith component of vector A, and B _i refers to the ith component of vector B. do. In relation to this, the greater the cosine similarity between two vectors, the more similar the two vectors can be considered to be.

상기 수학식 2에서, D는 벡터 A와 벡터 B 사이의 유클리드 거리로, A_i는 벡터 A의 i번째 성분, B_i는 벡터 B의 i번째 성분을 의미한다. 관련해서, 두 벡터 사이의 유클리드 거리가 작을수록 두 벡터가 서로 유사한 벡터라고 볼 수 있다.In Equation 2 above, D is the Euclidean distance between vector A and vector B, A _i is the ith component of vector A, and B _i is the ith component of vector B. In relation to this, the smaller the Euclidean distance between two vectors, the more similar the two vectors can be considered to be.

예컨대, 사전 데이터베이스(120)에는 하기의 표 2와 같이, 사전 설정된 복수의 단어들과, 상기 복수의 단어들 각각에 대응되는 사전 설정된 임베딩 벡터가 저장되어 있을 수 있다.For example, the dictionary database 120 may store a plurality of preset words and preset embedding vectors corresponding to each of the plurality of words, as shown in Table 2 below.

복수의 단어들plural words 임베딩 벡터Embedding vector 단어 1word 1 벡터 1Vector 1 단어 2word 2 벡터 2Vector 2 단어 3word 3 벡터 3Vector 3 ...... ...... 단어 10001000 words 벡터 1000Vector 1000

유사도 연산부(121)는 단어 추출부(117)에 의해 상기 적어도 하나의 단어가 추출되면, 사전 데이터베이스(120)를 참조하여, 상기 적어도 하나의 단어 각각에 대해, 각 단어에 대응되는 임베딩 벡터와, 상기 복수의 대표 단어들 각각에 대응되는 임베딩 벡터 간의 벡터 유사도를 연산함으로써, 상기 적어도 하나의 단어 각각에 대한 상기 복수의 대표 단어들과의 유사도를 연산한다.When the at least one word is extracted by the word extraction unit 117, the similarity calculation unit 121 refers to the dictionary database 120 and generates, for each of the at least one word, an embedding vector corresponding to each word, By calculating the vector similarity between the embedding vectors corresponding to each of the plurality of representative words, the similarity between the plurality of representative words for each of the at least one word is calculated.

매칭 처리부(122)는 상기 적어도 하나의 단어 각각에 대해, 상기 복수의 대표 단어들 중, 각 단어와의 유사도가 최대인 유사 대표 단어를 하나씩 매칭시킨다.The matching processing unit 122 matches each of the at least one word with a similar representative word that has the maximum degree of similarity to each word among the plurality of representative words, one by one.

생성 처리부(119)는 상기 적어도 하나의 단어 각각에 대해, 아바타 영상 클립 저장부(116)로부터, 각 단어에 매칭되는 유사 대표 단어에 대응되는 아바타 영상 클립을 추출함으로써, 상기 적어도 하나의 단어 각각에 대응되는 아바타 영상 클립을 추출한 후, 상기 적어도 하나의 단어 각각에 대응되는 아바타 영상 클립을, 상기 텍스트에서 각 단어가 위치하는 순서에 따라 이어붙여 상기 제1 아바타 영상을 생성한다.The generation processing unit 119 extracts an avatar video clip corresponding to a similar representative word matching each word from the avatar video clip storage unit 116 for each of the at least one word, After extracting the corresponding avatar video clip, the avatar video clip corresponding to each of the at least one word is concatenated according to the order in which each word is located in the text to generate the first avatar image.

이하에서는, 녹화 영상 생성부(111), 텍스트 변환부(112), 아바타 영상 클립 저장부(116), 단어 추출부(117), 유사도 연산부(121), 매칭 처리부(122) 및 생성 처리부(119)의 동작을 예를 들어, 상세히 설명하기로 한다.Hereinafter, the recorded image generator 111, the text converter 112, the avatar video clip storage unit 116, the word extractor 117, the similarity calculation unit 121, the matching processor 122, and the generation processor 119. )'s operation will be explained in detail using an example.

먼저, 전자 단말 장치(110)에, 사용자(130)에 의해, 프레젠테이션 문서에 대한 발표 영상의 녹화 명령이 인가되었다고 가정하자. 이때, 상기 프레젠테이션 문서는 '슬라이드 1, 슬라이드 2, 슬라이드 3, 슬라이드 4, 슬라이드 5'와 같은 복수의 슬라이드들로 구성되어 있다고 가정하자.First, assume that a command to record a presentation video for a presentation document is granted to the electronic terminal device 110 by the user 130 . At this time, let's assume that the presentation document consists of a plurality of slides such as 'Slide 1, Slide 2, Slide 3, Slide 4, and Slide 5'.

그러면, 녹화 영상 생성부(111)는 전자 단말 장치(110)에 탑재된 마이크로폰을 통해, '슬라이드 1, 슬라이드 2, 슬라이드 3, 슬라이드 4, 슬라이드 5' 각각에 대한 사용자(130)의 발표 음성을 입력받아, 상기 발표 음성을 기초로, '슬라이드 1, 슬라이드 2, 슬라이드 3, 슬라이드 4, 슬라이드 5' 각각에 대한 녹화 영상을 '녹화 영상 1, 녹화 영상 2, 녹화 영상 3, 녹화 영상 4, 녹화 영상 5'와 같이 생성할 수 있다.Then, the recorded image generator 111 generates the user 130's presentation voice for each of 'Slide 1, Slide 2, Slide 3, Slide 4, and Slide 5' through the microphone mounted on the electronic terminal device 110. Upon receiving the input, based on the above presentation voice, the recorded video for each of 'Slide 1, Slide 2, Slide 3, Slide 4, and Slide 5' is recorded as 'Recorded Video 1, Recorded Video 2, Recorded Video 3, Recorded Video 4, Recorded Video. It can be created as shown in video 5'.

이때, 전자 단말 장치(110)에, 사용자(130)에 의해, '슬라이드 2'에 대한 수어 영상의 추가 생성 명령이 인가되었다고 가정하자.At this time, let us assume that a command to create an additional sign language image for 'Slide 2' is applied to the electronic terminal device 110 by the user 130.

그러면, 텍스트 변환부(112)는 '슬라이드 2'에 대한 제1 녹화 영상인 '녹화 영상 2'로부터 음성을 추출하여, 사전 설정된 STT 모델을 기초로, 상기 추출된 음성에 대한 음성 인식을 수행함으로써, 상기 추출된 음성을 텍스트로 변환할 수 있다.Then, the text converter 112 extracts the voice from 'recorded video 2', which is the first recorded video for 'slide 2', and performs voice recognition on the extracted voice based on a preset STT model. , the extracted voice can be converted into text.

그러고 나서, 단어 추출부(117)는 상기 텍스트로부터, 상기 텍스트를 구성하는 적어도 하나의 단어를 추출할 수 있다.Then, the word extractor 117 may extract at least one word constituting the text from the text.

그 결과, 상기 적어도 하나의 단어로, '단어 12, 단어 13, 단어 15'가 추출되었다고 가정하자.As a result, let's assume that 'word 12, word 13, and word 15' are extracted as the at least one word.

그러면, 유사도 연산부(121)는 상기 표 2와 같은 사전 데이터베이스(120)를 참조하여, '단어 12, 단어 13, 단어 15' 각각에 대해, 각 단어에 대응되는 임베딩 벡터와, 상기 복수의 대표 단어들인 '단어 1, 단어 2, 단어 3, ..., 단어 10' 각각에 대응되는 임베딩 벡터 간의 벡터 유사도를 연산함으로써, '단어 12, 단어 13, 단어 15' 각각에 대한 '단어 1, 단어 2, 단어 3, ..., 단어 10'과의 유사도를 연산할 수 있다. 즉, 유사도 연산부(121)는 '단어 12'에 대해, '단어 1, 단어 2, 단어 3, ..., 단어 10'과의 유사도를 연산할 수 있고, '단어 13'에 대해, '단어 1, 단어 2, 단어 3, ..., 단어 10'과의 유사도를 연산할 수 있으며, '단어 15'에 대해, '단어 1, 단어 2, 단어 3, ..., 단어 10'과의 유사도를 연산할 수 있다.Then, the similarity calculation unit 121 refers to the dictionary database 120 as shown in Table 2 above, and for each of 'word 12, word 13, and word 15', an embedding vector corresponding to each word, and the plurality of representative words. By calculating the vector similarity between the embedding vectors corresponding to each of 'Word 1, Word 2, Word 3, ..., Word 10', 'Word 1, Word 2' for each of 'Word 12, Word 13, and Word 15' , word 3, ..., the similarity with word 10' can be calculated. That is, the similarity calculation unit 121 can calculate the similarity with 'word 1, word 2, word 3, ..., word 10' for 'word 12', and for 'word 13', 'word The similarity with '1, word 2, word 3, ..., word 10' can be calculated, and for 'word 15', the similarity with 'word 1, word 2, word 3, ..., word 10' Similarity can be calculated.

그 결과, '단어 12'에 대해서는 '단어 1, 단어 2, 단어 3, ..., 단어 10' 중, 유사도가 최대로 산출된 유사 대표 단어가 '단어 4'라고 하고, '단어 13'에 대해서는 '단어 1, 단어 2, 단어 3, ..., 단어 10' 중, 유사도가 최대로 산출된 유사 대표 단어가 '단어 1'이라고 하며, '단어 15'에 대해서는 '단어 1, 단어 2, 단어 3, ..., 단어 10' 중, 유사도가 최대로 산출된 유사 대표 단어가 '단어 8'이라고 가정하자.As a result, for 'word 12', among 'word 1, word 2, word 3, ..., word 10', the similar representative word with the maximum similarity is called 'word 4', and 'word 13' is For 'Word 1, Word 2, Word 3, ..., Word 10', the similar representative word with the maximum similarity is called 'Word 1', and for 'Word 15', it is called 'Word 1, Word 2,' Let's assume that among 'word 3,..., word 10', the similar representative word with the maximum similarity is 'word 8'.

그러면, 매칭 처리부(122)는 '단어 12, 단어 13, 단어 15' 각각에 대해, '단어 4, 단어 1, 단어 8'을 하나씩 매칭시킬 수 있다.Then, the matching processor 122 can match 'word 4, word 1, and word 8' one by one to each of 'word 12, word 13, and word 15.'

이때, 상기 표 1과 같은 아바타 영상 클립 저장부(116)에서는, '단어 12, 단어 13, 단어 15' 각각에 매칭되는 유사 대표 단어인 '단어 4, 단어 1, 단어 8'에 대응되는 아바타 영상 클립이 '아바타 영상 클립 4, 아바타 영상 클립 1, 아바타 영상 클립 8'이므로, 생성 처리부(119)는 상기 표 1과 같은 아바타 영상 클립 저장부(116)로부터, '아바타 영상 클립 4, 아바타 영상 클립 1, 아바타 영상 클립 8'을 추출함으로써, '단어 12, 단어 13, 단어 15' 각각에 대응되는 아바타 영상 클립을 '아바타 영상 클립 4, 아바타 영상 클립 1, 아바타 영상 클립 8'과 같이 추출할 수 있다.At this time, in the avatar video clip storage unit 116 as shown in Table 1, avatar images corresponding to 'word 4, word 1, and word 8', which are similar representative words that match each of 'word 12, word 13, and word 15' Since the clips are 'Avatar video clip 4, Avatar video clip 1, and Avatar video clip 8', the creation processing unit 119 selects 'Avatar video clip 4, Avatar video clip' from the avatar video clip storage unit 116 as shown in Table 1 above. By extracting '1, avatar video clip 8', the avatar video clips corresponding to each of 'word 12, word 13, and word 15' can be extracted as 'avatar video clip 4, avatar video clip 1, and avatar video clip 8'. there is.

그러고 나서, 생성 처리부(119)는 '단어 12, 단어 13, 단어 15' 각각에 대응되는 아바타 영상 클립인 '아바타 영상 클립 4, 아바타 영상 클립 1, 아바타 영상 클립 8'을, 상기 텍스트에서 각 단어가 위치하는 순서에 따라 이어붙여, '아바타 영상 1'을 생성할 수 있다.Then, the generation processing unit 119 generates 'Avatar video clip 4, Avatar video clip 1, and Avatar video clip 8', which are avatar video clips corresponding to each of 'Word 12, Word 13, and Word 15', and each word in the text. You can create 'Avatar Video 1' by splicing them according to the order in which they are located.

관련해서, 상기 텍스트에서 '단어 12, 단어 13, 단어 15'가 위치하는 순서가, '단어 12, 단어 13, 단어 15'라고 하는 경우, 생성 처리부(119)는 '아바타 영상 클립 4, 아바타 영상 클립 1, 아바타 영상 클립 8'을, '아바타 영상 클립 4, 아바타 영상 클립 1, 아바타 영상 클립 8'의 순서로 이어붙여, '아바타 영상 1'을 생성할 수 있다. In relation to this, if the order in which 'word 12, word 13, and word 15' are located in the text is 'word 12, word 13, word 15', the generation processing unit 119 generates 'avatar video clip 4, avatar video You can create 'Avatar video 1' by concatenating 'Clip 1, Avatar video clip 8' in the order of 'Avatar video clip 4, Avatar video clip 1, and Avatar video clip 8'.

이렇게, 아바타 영상 생성부(113)에 의해 상기 제1 아바타 영상이 생성되면, 수어 영상 생성부(114)는 상기 제1 아바타 영상을 상기 제1 녹화 영상의 사전 설정된 삽입 위치에 삽입하여 하나의 영상으로 합성함으로써, 수어 영상을 생성할 수 있다.In this way, when the first avatar image is generated by the avatar image generator 113, the sign language image generator 114 inserts the first avatar image into the preset insertion position of the first recorded image to create one image. By compositing, a sign language image can be created.

이때, 본 발명의 일실시예에 따르면, 수어 영상 생성부(114)는 아바타 영상 생성부(113)에 의해 상기 제1 아바타 영상이 생성되면, 상기 제1 녹화 영상에서 음성이 시작되는 제1 시작 시점의 시각과 음성이 종료되는 제1 종료 시점의 시각을 확인하여, 상기 제1 시작 시점의 시각과 상기 제1 종료 시점의 시각을 기초로, 상기 제1 아바타 영상이, 상기 제1 시작 시점과 동일한 시점에 시작되고, 상기 제1 종료 시점과 동일한 시점에 종료되도록 상기 제1 아바타 영상의 재생 속도를 조정한 후, 재생 속도가 조정된 상기 제1 아바타 영상을 상기 제1 녹화 영상의 상기 사전 설정된 삽입 위치에 삽입하여 하나의 영상으로 합성함으로써, 상기 수어 영상을 생성할 수 있다.At this time, according to an embodiment of the present invention, when the first avatar image is generated by the avatar image generator 113, the sign language image generator 114 generates a first start signal in which a voice starts from the first recorded image. The time of the viewpoint and the time of the first end point where the voice ends are confirmed, and based on the time of the first start point and the time of the first end point, the first avatar image is generated at the first start point and the first end point. After adjusting the playback speed of the first avatar image so that it starts at the same point in time and ends at the same point in time as the first end point, the first avatar image with the adjusted playback speed is displayed at the preset point of the first recorded video. The sign language image can be generated by inserting it at the insertion position and combining it into one image.

영상 삽입부(115)는 상기 복수의 슬라이드들 각각에, 각 슬라이드에 대한 녹화 영상을 개체로 삽입하되, 상기 제1 슬라이드에는 상기 수어 영상을 개체로 추가 삽입한다.The video insertion unit 115 inserts the recorded video for each slide as an object into each of the plurality of slides, and additionally inserts the sign language video as an object into the first slide.

이하에서는, 수어 영상 생성부(114) 및 영상 삽입부(115)의 동작을 예를 들어, 상세히 설명하기로 한다.Hereinafter, the operations of the sign language image generator 114 and the image insertion unit 115 will be described in detail using an example.

먼저, 전술한 예와 같이, 아바타 영상 생성부(113)에 의해 '아바타 영상 1'이 생성되었다고 가정하자.First, let's assume that 'Avatar Image 1' is created by the avatar image generator 113, as in the above-described example.

그러면, 수어 영상 생성부(114)는 '슬라이드 2'에 대한 녹화 영상인 '녹화 영상 2'에서 음성이 시작되는 제1 시작 시점의 시각과 음성이 종료되는 제1 종료 시점의 시각을 확인할 수 있다.Then, the sign language video generator 114 can check the first start time when the voice starts and the first end time when the voice ends in 'Recorded Video 2', which is the recorded video for 'Slide 2'. .

그 결과, 상기 제1 시작 시점의 시각이 '5초'라고 하고, 상기 제1 종료 시점의 시각이 '1분 47초'라고 하는 경우, 상기 제1 시작 시점의 시각인 '5초'와 상기 제1 종료 시점의 시각인 '1분 47초'를 기초로, '아바타 영상 1'이, '5초'에 시작되고, '1분 47초'에 종료되도록 '아바타 영상 1'의 재생 속도를 조정할 수 있다.As a result, if the time of the first start point is '5 seconds' and the time of the first end point is '1 minute and 47 seconds', '5 seconds', which is the time of the first start point, and the Based on the first ending time of '1 minute and 47 seconds', the playback speed of 'Avatar Video 1' is adjusted so that 'Avatar Video 1' starts at '5 seconds' and ends at '1 minute and 47 seconds'. It can be adjusted.

그러고 나서, 수어 영상 생성부(114)는 재생 속도가 조정된 '아바타 영상 1'을 '녹화 영상 2'의 상기 사전 설정된 삽입 위치에 삽입하여 하나의 영상으로 합성함으로써, 상기 수어 영상을 생성할 수 있다.Then, the sign language image generator 114 inserts 'Avatar Image 1' with the playback speed adjusted at the preset insertion position of 'Recorded Image 2' and synthesizes it into one image, thereby generating the sign language image. there is.

관련해서, '아바타 영상 1'이 도 2의 도면부호 210으로 표시된 그림과 같다고 하고, '녹화 영상 2'가 도 2의 도면부호 220으로 표시된 그림과 같다고 하는 경우, 수어 영상 생성부(114)는 도 2의 도면부호 230으로 표시된 그림과 같이, '아바타 영상 1(210)'을 '녹화 영상 2(220)'의 사전 설정된 삽입 위치에 삽입하여 하나의 영상으로 합성함으로써, 수어 영상을 생성할 수 있다.In relation to this, if 'avatar video 1' is said to be the same as the picture indicated by reference numeral 210 in FIG. 2 and 'recorded video 2' is the same as the picture indicated by reference numeral 220 in FIG. 2, the sign language image generator 114 As shown in the figure indicated by reference numeral 230 in FIG. 2, a sign language video can be created by inserting 'Avatar Image 1 (210)' at the preset insertion position of 'Recorded Image 2 (220)' and compositing it into one image. there is.

그러고 나서, 영상 삽입부(115)는 '슬라이드 1, 슬라이드 2, 슬라이드 3, 슬라이드 4, 슬라이드 5' 각각에, 각 슬라이드에 대한 녹화 영상을 개체로 삽입하되, '슬라이드 2'에는 상기 수어 영상을 개체로 추가 삽입할 수 있다.Then, the video insertion unit 115 inserts the recorded video for each slide as an object into each of 'Slide 1, Slide 2, Slide 3, Slide 4, and Slide 5', and inserts the sign language video into 'Slide 2'. Additional objects can be inserted.

예컨대, 영상 삽입부(115)는 '슬라이드 1, 슬라이드 2, 슬라이드 3, 슬라이드 4, 슬라이드 5' 각각에, 도 3의 도면부호 310으로 도시된 그림과 같이, 각 슬라이드에 대한 녹화 영상을 개체(301)로 삽입하되, '슬라이드 2'에는 도면부호 320으로 도시된 그림과 같이, 수어 영상을 개체(302)로 추가 삽입할 수 있다.For example, the video insertion unit 115 inserts the recorded video for each slide into 'Slide 1, Slide 2, Slide 3, Slide 4, and Slide 5' as an object (as shown by reference numeral 310 in FIG. 3). 301), but in 'Slide 2', a sign language image can be additionally inserted as an object 302, as shown in the drawing 320.

본 발명의 일실시예에 따르면, 전자 단말 장치(110)는 재생 여부 질의부(123), 영상 선택 질의부(124) 및 영상 재생부(125)를 더 포함할 수 있다.According to one embodiment of the present invention, the electronic terminal device 110 may further include a playback inquiry unit 123, an image selection inquiry unit 124, and an image playback unit 125.

재생 여부 질의부(123)는 영상 삽입부(115)에 의해, 상기 복수의 슬라이드들에 각 슬라이드에 대한 녹화 영상이 개체로 삽입되고, 상기 제1 슬라이드에 상기 수어 영상이 개체로 추가 삽입된 이후, 사용자(130)에 의해, 상기 프레젠테이션 문서에 대한 슬라이드쇼 실행 명령이 인가되면, 상기 복수의 슬라이드들에 개체로 삽입되어 있는 녹화 영상을 재생할 것인지 여부를 질의하는 질의 메시지를 화면 상에 표시한다.The playback query unit 123 is operated by the video insertion unit 115 after the recorded video for each slide is inserted as an object into the plurality of slides and the sign language video is additionally inserted as an object into the first slide. When a slideshow execution command for the presentation document is granted by the user 130, a query message asking whether to play the recorded video inserted as an object in the plurality of slides is displayed on the screen.

영상 선택 질의부(124)는 사용자(130)에 의해, 녹화 영상을 재생할 것을 지시하는 재생 명령이 인가되면, 상기 복수의 슬라이드들의 페이지 순서에 따라, 상기 복수의 슬라이드들에 개체로 삽입되어 있는 녹화 영상을 순차적으로 재생하여 화면 상에 표시하되, 상기 복수의 슬라이드들 중 상기 제1 슬라이드에 삽입된 상기 제1 녹화 영상을 재생할 순서가 되면, 상기 제1 녹화 영상을 재생할 것인지 또는 상기 수어 영상을 재생할 것인지 여부를 질의하는 영상 선택 질의 메시지를 화면 상에 표시한다.When a playback command instructing to play a recorded video is applied by the user 130, the video selection query unit 124 records a video inserted as an object in the plurality of slides according to the page order of the plurality of slides. Videos are played sequentially and displayed on the screen, and when it is the turn to play the first recorded video inserted into the first slide among the plurality of slides, whether to play the first recorded video or the sign language video. A video selection query message is displayed on the screen asking whether the video is available.

영상 재생부(125)는 사용자(130)에 의해, 상기 제1 녹화 영상을 재생할 것을 지시하는 녹화 영상 재생 명령이 인가되면, 상기 제1 녹화 영상을 재생하고, 상기 수어 영상을 재생할 것을 지시하는 수어 영상 재생 명령이 인가되면, 상기 수어 영상을 재생한다.When a recorded video playback command instructing to play the first recorded video is applied by the user 130, the video playback unit 125 plays the first recorded video and a sign language instructing to play the sign language video. When a video playback command is applied, the sign language video is played.

이하에서는, 재생 여부 질의부(123), 영상 선택 질의부(124) 및 영상 재생부(125)의 동작을 예를 들어, 상세히 설명하기로 한다.Hereinafter, the operations of the playback query unit 123, the video selection query unit 124, and the video playback unit 125 will be described in detail using examples.

먼저, 전술한 예와 같이, 영상 삽입부(115)에 의해, '슬라이드 1, 슬라이드 2, 슬라이드 3, 슬라이드 4, 슬라이드 5' 각각에, 각 슬라이드에 대한 녹화 영상이 개체로 삽입되고, '슬라이드 2'에 상기 수어 영상이 개체로 추가 삽입되었다고 하고, 그 이후, 전자 단말 장치(110)에 사용자(130)에 의해, 상기 프레젠테이션 문서에 대한 슬라이드쇼 실행 명령이 인가되었다고 가정하자.First, as in the above-mentioned example, the recorded video for each slide is inserted as an object in each of 'Slide 1, Slide 2, Slide 3, Slide 4, and Slide 5' by the video insertion unit 115, and 'Slide Assume that the sign language image is additionally inserted as an object in 2', and thereafter, a slideshow execution command for the presentation document is granted to the electronic terminal device 110 by the user 130.

그러면, 재생 여부 질의부(123)는 '슬라이드 1, 슬라이드 2, 슬라이드 3, 슬라이드 4, 슬라이드 5'에 개체로 삽입되어 있는 녹화 영상을 재생할 것인지 여부를 질의하는 질의 메시지를 화면 상에 표시할 수 있다.Then, the playback query unit 123 can display a query message on the screen asking whether to play the recorded video inserted as an object in 'Slide 1, Slide 2, Slide 3, Slide 4, and Slide 5'. there is.

이때, 전자 단말 장치(110)에 사용자(130)에 의해, 녹화 영상을 재생할 것을 지시하는 재생 명령이 인가되었다고 가정하자. At this time, let us assume that a play command instructing to play the recorded video is applied to the electronic terminal device 110 by the user 130.

그러면, 영상 선택 질의부(124)는 '슬라이드 1, 슬라이드 2, 슬라이드 3, 슬라이드 4, 슬라이드 5'의 페이지 순서에 따라, '슬라이드 1, 슬라이드 2, 슬라이드 3, 슬라이드 4, 슬라이드 5'에 개체로 삽입되어 있는 녹화 영상인 '녹화 영상 1, 녹화 영상 2, 녹화 영상 3, 녹화 영상 4, 녹화 영상 5'를 순차적으로 재생하여 화면 상에 표시할 수 있다.Then, the image selection query unit 124 selects the objects in 'Slide 1, Slide 2, Slide 3, Slide 4, and Slide 5' according to the page order of 'Slide 1, Slide 2, Slide 3, Slide 4, and Slide 5'. The recorded videos inserted in 'Recorded Video 1, Recorded Video 2, Recorded Video 3, Recorded Video 4, and Recorded Video 5' can be played sequentially and displayed on the screen.

이때, '슬라이드 2'에 삽입된 '녹화 영상 2'에 삽입된 '녹화 영상 2'를 재생할 순서가 되면, 영상 선택 질의부(124)는 '녹화 영상 2'를 재생할 것인지 또는 상기 수어 영상을 재생할 것인지 여부를 질의하는 영상 선택 메시지를 화면 상에 표시할 수 있다.At this time, when it is the turn to play 'recorded video 2' inserted in 'recorded video 2' inserted in 'slide 2', the video selection query unit 124 determines whether to play 'recorded video 2' or the sign language video. A video selection message inquiring whether or not the video is available can be displayed on the screen.

이때, 전자 단말 장치(110)에 사용자(130)에 의해, '녹화 영상 2'를 재생할 것을 지시하는 녹화 영상 재생 명령이 인가되었다고 하는 경우, 영상 재생부(125)는 '녹화 영상 2'를 재생할 수 있고, 전자 단말 장치(110)에 사용자(130)에 의해, 상기 수어 영상을 재생할 것을 지시하는 수어 영상 재생 명령이 인가되었다고 하는 경우, 영상 재생부(125)는 상기 수어 영상을 재생할 수 있다.At this time, when a recorded video playback command instructing to play 'recorded video 2' is applied to the electronic terminal device 110 by the user 130, the video playback unit 125 plays 'recorded video 2'. When it is said that a sign language video playback command instructing to play the sign language video is applied to the electronic terminal device 110 by the user 130, the video playback unit 125 can play the sign language video.

도 4는 본 발명의 일실시예에 따른 프레젠테이션 문서에 대한 수어 발표 영상을 제작할 수 있는 전자 단말 장치의 동작 방법을 도시한 순서도이다.Figure 4 is a flowchart showing a method of operating an electronic terminal device capable of producing a sign language presentation video for a presentation document according to an embodiment of the present invention.

단계(S410)에서는 사용자에 의해, 프레젠테이션 문서에 대한 발표 영상의 녹화 명령이 인가되면, 상기 전자 단말 장치에 탑재된 마이크로폰을 통해, 상기 프레젠테이션 문서를 구성하는 복수의 슬라이드들 각각에 대한 상기 사용자의 발표 음성을 입력받아, 상기 발표 음성을 기초로, 상기 복수의 슬라이드들 각각에 대한 녹화 영상을 생성한다.In step S410, when a command to record a presentation video for a presentation document is issued by the user, the user's presentation for each of a plurality of slides constituting the presentation document is performed through a microphone mounted on the electronic terminal device. An audio input is received, and a recorded video for each of the plurality of slides is generated based on the presentation audio.

단계(S420)에서는 상기 사용자에 의해, 상기 복수의 슬라이드들 중 어느 하나인 제1 슬라이드에 대한 수어 영상의 추가 생성 명령이 인가되면, 상기 제1 슬라이드에 대한 제1 녹화 영상으로부터 음성을 추출하여, 사전 설정된 STT 모델을 기초로, 상기 추출된 음성에 대한 음성 인식을 수행함으로써, 상기 추출된 음성을 텍스트로 변환한다.In step S420, when a command to add a sign language video for a first slide, which is one of the plurality of slides, is commanded by the user, a voice is extracted from the first recorded video for the first slide, The extracted voice is converted into text by performing voice recognition on the extracted voice based on a preset STT model.

단계(S430)에서는 상기 추출된 음성이 상기 텍스트로 변환되면, 상기 텍스트를 수어로 표현하는 동작을 갖는 제1 아바타 영상을 생성한다.In step S430, when the extracted voice is converted into text, a first avatar image is generated that has the action of expressing the text in sign language.

단계(S440)에서는 상기 제1 아바타 영상이 생성되면, 상기 제1 아바타 영상을 상기 제1 녹화 영상의 사전 설정된 삽입 위치에 삽입하여 하나의 영상으로 합성함으로써, 수어 영상을 생성한다.In step S440, when the first avatar image is generated, the first avatar image is inserted into a preset insertion position of the first recorded image and synthesized into one image to generate a sign language image.

단계(S450)에서는 상기 복수의 슬라이드들 각각에, 각 슬라이드에 대한 녹화 영상을 개체로 삽입하되, 상기 제1 슬라이드에는 상기 수어 영상을 개체로 추가 삽입한다.In step S450, the recorded video for each slide is inserted as an object into each of the plurality of slides, and the sign language video is additionally inserted as an object into the first slide.

이때, 본 발명의 일실시예에 따르면, 단계(S430)에서는 수어 번역을 위한 사전 설정된 복수의 대표 단어들과, 상기 복수의 대표 단어들 각각을 수어로 표현하는 동작을 갖는 사전 제작된 아바타 영상 클립이 서로 대응되어 저장되어 있는 아바타 영상 클립 저장부를 유지하는 단계, 상기 추출된 음성이 상기 텍스트로 변환되면, 상기 텍스트로부터, 상기 텍스트를 구성하는 적어도 하나의 단어를 추출하는 단계, 상기 적어도 하나의 단어가 추출되면, 상기 적어도 하나의 단어 각각에 대해, 상기 복수의 대표 단어들 중 각 단어와의 유사도가 최대인 유사 대표 단어를 하나씩 매칭하는 단계 및 상기 적어도 하나의 단어 각각에 대해, 상기 아바타 영상 클립 저장부로부터, 각 단어에 매칭되는 유사 대표 단어에 대응되는 아바타 영상 클립을 추출함으로써, 상기 적어도 하나의 단어 각각에 대응되는 아바타 영상 클립을 추출한 후, 상기 적어도 하나의 단어 각각에 대응되는 아바타 영상 클립을, 상기 텍스트에서 각 단어가 위치하는 순서에 따라 이어붙여 상기 제1 아바타 영상을 생성하는 단계를 포함할 수 있다.At this time, according to an embodiment of the present invention, in step S430, a pre-produced avatar video clip having a plurality of preset representative words for sign language translation and an operation of expressing each of the plurality of representative words in sign language is provided. maintaining an avatar video clip storage unit stored in correspondence with each other; when the extracted voice is converted to the text, extracting at least one word constituting the text from the text; the at least one word When extracted, for each of the at least one word, matching a similar representative word with the maximum similarity to each word among the plurality of representative words one by one; and for each of the at least one word, the avatar video clip. From the storage unit, by extracting an avatar video clip corresponding to a similar representative word matching each word, the avatar video clip corresponding to each of the at least one word is extracted, and then the avatar video clip corresponding to each of the at least one word is extracted. It may include generating the first avatar image by concatenating each word in the text according to the order in which they are located.

이때, 본 발명의 일실시예에 따르면, 상기 매칭하는 단계는 사전 설정된 복수의 단어들과, 상기 복수의 단어들 각각에 대응되는 사전 설정된 임베딩 벡터(상기 복수의 단어들 각각에 대응되는 임베딩 벡터는, 단어들 간의 유사도에 기반하여 사전 설정된 벡터로서, 단어들 간의 의미가 서로 유사할수록 해당 단어들에 대응되는 임베딩 벡터 간의 벡터 유사도가 큰 값으로 산출되도록 설정되어 있음)가 저장되어 있는 사전 데이터베이스를 유지하는 단계, 상기 적어도 하나의 단어가 추출되면, 상기 사전 데이터베이스를 참조하여, 상기 적어도 하나의 단어 각각에 대해, 각 단어에 대응되는 임베딩 벡터와, 상기 복수의 대표 단어들 각각에 대응되는 임베딩 벡터 간의 벡터 유사도를 연산함으로써, 상기 적어도 하나의 단어 각각에 대한 상기 복수의 대표 단어들과의 유사도를 연산하는 단계 및 상기 적어도 하나의 단어 각각에 대해, 상기 복수의 대표 단어들 중, 각 단어와의 유사도가 최대인 유사 대표 단어를 하나씩 매칭시키는 단계를 포함할 수 있다.At this time, according to an embodiment of the present invention, the matching step includes a plurality of preset words and a preset embedding vector corresponding to each of the plurality of words (the embedding vector corresponding to each of the plurality of words is , a preset vector based on the similarity between words; the more similar the meaning between words are, the greater the vector similarity between the embedding vectors corresponding to the words is calculated) is maintained in a dictionary database that is stored. When the at least one word is extracted, referring to the dictionary database, for each of the at least one word, between an embedding vector corresponding to each word and an embedding vector corresponding to each of the plurality of representative words. calculating similarity with the plurality of representative words for each of the at least one word by calculating vector similarity; and for each of the at least one word, similarity with each word among the plurality of representative words. It may include the step of matching similar representative words with a maximum one by one.

또한, 본 발명의 일실시예에 따르면, 단계(S440)에서는 상기 제1 아바타 영상이 생성되면, 상기 제1 녹화 영상에서 음성이 시작되는 제1 시작 시점의 시각과 음성이 종료되는 제1 종료 시점의 시각을 확인하여, 상기 제1 시작 시점의 시각과 상기 제1 종료 시점의 시각을 기초로, 상기 제1 아바타 영상이, 상기 제1 시작 시점과 동일한 시점에 시작되고, 상기 제1 종료 시점과 동일한 시점에 종료되도록 상기 제1 아바타 영상의 재생 속도를 조정한 후, 재생 속도가 조정된 상기 제1 아바타 영상을 상기 제1 녹화 영상의 상기 사전 설정된 삽입 위치에 삽입하여 하나의 영상으로 합성함으로써, 상기 수어 영상을 생성할 수 있다.Additionally, according to an embodiment of the present invention, in step S440, when the first avatar image is generated, a first start time and a first end time when the sound ends in the first recorded image By checking the time, based on the first start time and the first end time, the first avatar image starts at the same time as the first start time, and the first end time and After adjusting the playback speed of the first avatar image so that it ends at the same point in time, the first avatar image with the adjusted playback speed is inserted into the preset insertion position of the first recorded image and synthesized into one image, The sign language image can be generated.

또한, 본 발명의 일실시예에 따르면, 상기 전자 단말 장치의 동작 방법은 단계(S450)에 의해, 상기 복수의 슬라이드들에 각 슬라이드에 대한 녹화 영상이 개체로 삽입되고, 상기 제1 슬라이드에 상기 수어 영상이 개체로 추가 삽입된 이후, 상기 사용자에 의해, 상기 프레젠테이션 문서에 대한 슬라이드쇼 실행 명령이 인가되면, 상기 복수의 슬라이드들에 개체로 삽입되어 있는 녹화 영상을 재생할 것인지 여부를 질의하는 질의 메시지를 화면 상에 표시하는 단계, 상기 사용자에 의해, 녹화 영상을 재생할 것을 지시하는 재생 명령이 인가되면, 상기 복수의 슬라이드들의 페이지 순서에 따라, 상기 복수의 슬라이드들에 개체로 삽입되어 있는 녹화 영상을 순차적으로 재생하여 화면 상에 표시하되, 상기 복수의 슬라이드들 중 상기 제1 슬라이드에 삽입된 상기 제1 녹화 영상을 재생할 순서가 되면, 상기 제1 녹화 영상을 재생할 것인지 또는 상기 수어 영상을 재생할 것인지 여부를 질의하는 영상 선택 질의 메시지를 화면 상에 표시하는 단계 및 상기 사용자에 의해, 상기 제1 녹화 영상을 재생할 것을 지시하는 녹화 영상 재생 명령이 인가되면, 상기 제1 녹화 영상을 재생하고, 상기 수어 영상을 재생할 것을 지시하는 수어 영상 재생 명령이 인가되면, 상기 수어 영상을 재생하는 단계를 더 포함할 수 있다.In addition, according to an embodiment of the present invention, in the method of operating the electronic terminal device, in step S450, the recorded image for each slide is inserted as an object into the plurality of slides, and the recorded image for each slide is inserted into the first slide. After a sign language video is additionally inserted as an object, if a slideshow execution command for the presentation document is granted by the user, a query message inquiring whether to play the recorded video inserted as an object in the plurality of slides. Displaying on the screen, when a play command instructing to play the recorded video is issued by the user, the recorded video inserted as an object in the plurality of slides is displayed according to the page order of the plurality of slides. They are played sequentially and displayed on the screen, and when it is the turn to play the first recorded video inserted into the first slide among the plurality of slides, whether to play the first recorded video or the sign language video. displaying a video selection query message querying a screen, and when a recorded video playback command instructing to play the first recorded video is applied by the user, playing the first recorded video, and playing the sign language video. When a sign language video playback command instructing to play is applied, the step of playing the sign language video may be further included.

이상, 도 4를 참조하여 본 발명의 일실시예에 따른 전자 단말 장치의 동작 방법에 대해 설명하였다. 여기서, 본 발명의 일실시예에 따른 전자 단말 장치의 동작 방법은 도 1 내지 도 3을 이용하여 설명한 전자 단말 장치(110)의 동작에 대한 구성과 대응될 수 있으므로, 이에 대한 보다 상세한 설명은 생략하기로 한다.Above, a method of operating an electronic terminal device according to an embodiment of the present invention has been described with reference to FIG. 4 . Here, the method of operating the electronic terminal device according to an embodiment of the present invention may correspond to the configuration of the operation of the electronic terminal device 110 described using FIGS. 1 to 3, so a detailed description thereof is omitted. I decided to do it.

본 발명의 일실시예에 따른 전자 단말 장치의 동작 방법은 컴퓨터와의 결합을 통해 실행시키기 위한 저장매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다.The method of operating an electronic terminal device according to an embodiment of the present invention may be implemented as a computer program stored in a storage medium to be executed through combination with a computer.

또한, 본 발명의 일실시예에 따른 전자 단말 장치의 동작 방법은 컴퓨터와의 결합을 통해 실행시키기 위한 컴퓨터 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Additionally, the method of operating an electronic terminal device according to an embodiment of the present invention may be implemented in the form of computer program instructions for execution through combination with a computer and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be specially designed and constructed for the present invention or may be known and usable by those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described with specific details such as specific components and limited embodiments and drawings, but this is only provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , those skilled in the art can make various modifications and variations from this description.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Accordingly, the spirit of the present invention should not be limited to the described embodiments, and the scope of the patent claims described below as well as all modifications that are equivalent or equivalent to the scope of this patent claim shall fall within the scope of the spirit of the present invention. .

110: 전자 단말 장치
111: 녹화 영상 생성부 112: 텍스트 변환부
113: 아바타 영상 생성부 114: 수어 영상 생성부
115: 영상 삽입부 116: 아바타 영상 클립 저장부
117: 단어 추출부 118: 단어 매칭부
119: 생성 처리부 120: 사전 데이터베이스
121: 유사도 연산부 122: 매칭 처리부
123: 재생 여부 질의부 124: 영상 선택 질의부
125: 영상 재생부
130: 사용자110: Electronic terminal device
111: Recorded video generation unit 112: Text conversion unit
113: Avatar image generation unit 114: Sign language image generation unit
115: video insertion unit 116: avatar video clip storage unit
117: word extraction unit 118: word matching unit
119: Generation processing unit 120: Dictionary database
121: Similarity calculation unit 122: Matching processing unit
123: Playback query part 124: Video selection query part
125: Video playback unit
130: user

Claims

In the electronic terminal device capable of producing a sign language presentation video for a presentation document,
When a command to record a presentation video for a presentation document is issued by a user, the user's presentation voice for each of a plurality of slides constituting the presentation document is input through a microphone mounted on the electronic terminal device, a recorded video generator that generates a recorded video for each of the plurality of slides based on the presentation audio;
When a command to create an additional sign language video for the first slide, which is one of the plurality of slides, is applied by the user, voice is extracted from the first recorded video for the first slide, and a preset STT (Speech a text conversion unit that converts the extracted voice into text by performing voice recognition on the extracted voice based on a (To Text) model;
an avatar image generator that generates a first avatar image having an action to express the text in sign language when the extracted voice is converted into text;
When the first avatar image is generated, a sign language image generator generates a sign language image by inserting the first avatar image into a preset insertion position of the first recorded image and compositing it into one image; and
A video insertion unit that inserts the recorded video for each slide as an object into each of the plurality of slides, and additionally inserts the sign language video as an object into the first slide.
Electronic terminal device including.

According to paragraph 1,
The avatar image generator
an avatar video clip storage unit in which a plurality of preset representative words for sign language translation and pre-produced avatar video clips having an operation for expressing each of the plurality of representative words in sign language are stored in correspondence with each other;
When the extracted voice is converted into the text, a word extractor for extracting at least one word constituting the text from the text;
When the at least one word is extracted, a word matching unit that matches each of the at least one word with a similar representative word that has the maximum similarity to each word among the plurality of representative words one by one; and
For each of the at least one word, an avatar video clip corresponding to a similar representative word matching each word is extracted from the avatar video clip storage unit, thereby extracting an avatar video clip corresponding to each of the at least one word. , a generation processing unit that generates the first avatar image by splicing the avatar image clips corresponding to each of the at least one word according to the order in which each word is located in the text.
Electronic terminal device including.

According to paragraph 2,
The word matching unit
A plurality of preset words, and a preset embedding vector corresponding to each of the plurality of words - the embedding vector corresponding to each of the plurality of words is a vector preset based on similarity between words, The vector similarity between the embedding vectors corresponding to the words is set to be calculated as a large value as the meaning between words is similar to each other. - A dictionary database in which is stored;
When the at least one word is extracted, by referring to the dictionary database, for each of the at least one word, the vector similarity between the embedding vector corresponding to each word and the embedding vector corresponding to each of the plurality of representative words is calculated. a similarity calculation unit that calculates the similarity between each of the at least one word and the plurality of representative words; and
A matching processing unit that matches, for each of the at least one word, one similar representative word with the maximum degree of similarity to each word among the plurality of representative words.
Electronic terminal device including.

According to paragraph 1,
The sign language image generator
When the first avatar image is generated, the first start time when the voice starts and the first end time when the voice ends in the first recorded video are checked, and the first start time and the first end time are confirmed. 1 Based on the end time, the playback speed of the first avatar image is adjusted so that the first avatar image starts at the same time as the first start time and ends at the same time as the first end time. Then, the electronic terminal device is characterized in that the sign language image is generated by inserting the first avatar image with the playback speed adjusted at the preset insertion position of the first recorded image and compositing it into one image.

According to paragraph 1,
After the recorded video for each slide is inserted as an object into the plurality of slides by the video insertion unit, and the sign language video is additionally inserted as an object into the first slide, the user enters the presentation document. a playback query unit that displays a query message on the screen inquiring whether to play a recorded video inserted as an object in the plurality of slides when a slideshow execution command is applied;
When a play command instructing to play a recorded video is issued by the user, the recorded video inserted as an object in the plurality of slides is sequentially played and displayed on the screen according to the page order of the plurality of slides. However, when it is the turn to play the first recorded video inserted into the first slide among the plurality of slides, a video selection query message inquiring whether to play the first recorded video or the sign language video is sent. An image selection query unit to be displayed on the screen; and
When a recorded video playback command instructing to play the first recorded video is applied by the user, the first recorded video is played, and when a sign language video playback command instructing to play the sign language video is applied, the sign language video is played. Video playback unit that plays video
An electronic terminal device further comprising:

In a method of operating an electronic terminal device capable of producing a sign language presentation video for a presentation document,
When a command to record a presentation video for a presentation document is issued by a user, the user's presentation voice for each of a plurality of slides constituting the presentation document is input through a microphone mounted on the electronic terminal device, generating a recorded video for each of the plurality of slides based on the presentation audio;
When a command to create an additional sign language video for the first slide, which is one of the plurality of slides, is applied by the user, voice is extracted from the first recorded video for the first slide, and a preset STT (Speech Converting the extracted voice into text by performing voice recognition on the extracted voice based on a (To Text) model;
When the extracted voice is converted into the text, generating a first avatar image having an action of expressing the text in sign language;
When the first avatar image is generated, generating a sign language image by inserting the first avatar image into a preset insertion position of the first recorded image and compositing it into one image; and
Inserting the recorded video for each slide as an object into each of the plurality of slides, and additionally inserting the sign language video as an object into the first slide.
A method of operating an electronic terminal device comprising:

According to clause 6,
The step of generating the first avatar image is
Maintains an avatar video clip storage unit in which a plurality of preset representative words for sign language translation and pre-produced avatar video clips with actions to express each of the plurality of representative words in sign language are stored in correspondence with each other. steps;
When the extracted voice is converted into the text, extracting at least one word constituting the text from the text;
When the at least one word is extracted, matching each of the at least one word with a similar representative word having the maximum degree of similarity to each word among the plurality of representative words one by one; and
For each of the at least one word, an avatar video clip corresponding to a similar representative word matching each word is extracted from the avatar video clip storage unit, thereby extracting an avatar video clip corresponding to each of the at least one word. , generating the first avatar image by splicing the avatar image clips corresponding to each of the at least one word according to the order in which each word is located in the text.
A method of operating an electronic terminal device comprising:

In clause 7,
The matching step is
A plurality of preset words, and a preset embedding vector corresponding to each of the plurality of words - the embedding vector corresponding to each of the plurality of words is a vector preset based on similarity between words, Maintaining a dictionary database in which the vector similarity between the embedding vectors corresponding to the words is set to be calculated as a large value as the meaning between the words is similar to each other;
When the at least one word is extracted, by referring to the dictionary database, for each of the at least one word, the vector similarity between the embedding vector corresponding to each word and the embedding vector corresponding to each of the plurality of representative words is calculated. calculating the degree of similarity between each of the at least one word and the plurality of representative words by calculating; and
For each of the at least one word, matching similar representative words with the maximum degree of similarity to each word among the plurality of representative words one by one.
A method of operating an electronic terminal device comprising:

According to clause 6,
The step of generating the sign language image is
When the first avatar image is generated, the first start time when the voice starts and the first end time when the voice ends in the first recorded video are checked, and the first start time and the first end time are confirmed. 1 Based on the end time, the playback speed of the first avatar image is adjusted so that the first avatar image starts at the same time as the first start time and ends at the same time as the first end time. Afterwards, the first avatar image with the playback speed adjusted is inserted into the preset insertion position of the first recorded image and synthesized into one image, thereby generating the sign language image. .

According to clause 6,
By the inserting step, the recorded video for each slide is inserted as an object in the plurality of slides, and the sign language video is additionally inserted as an object in the first slide, and then added to the presentation document by the user. When a slideshow execution command is approved, displaying a query message on the screen asking whether to play a recorded video inserted as an object in the plurality of slides;
When a play command instructing to play a recorded video is issued by the user, the recorded video inserted as an object in the plurality of slides is sequentially played and displayed on the screen according to the page order of the plurality of slides. However, when it is the turn to play the first recorded video inserted into the first slide among the plurality of slides, a video selection query message inquiring whether to play the first recorded video or the sign language video is sent. displaying on a screen; and
When a recorded video playback command instructing to play the first recorded video is applied by the user, the first recorded video is played, and when a sign language video playback command instructing to play the sign language video is applied, the sign language video is played. Steps to play video
A method of operating an electronic terminal device further comprising:

A computer-readable recording medium recording a computer program for executing the method of any one of claims 6 to 10 through combination with a computer.

A computer program stored in a storage medium for executing the method of any one of claims 6 to 10 through combination with a computer.