KR102368233B1

KR102368233B1 - Flatform system for providing video communication based on plug-in and method for providing video communication voting using the same

Info

Publication number: KR102368233B1
Application number: KR1020210022009A
Authority: KR
Inventors: 박영선
Original assignee: 주식회사 라젠
Priority date: 2021-02-18
Filing date: 2021-02-18
Publication date: 2022-03-03

Abstract

A plug-in-based video communication platform system according to one technical aspect of the present invention, includes: a user application which is installed on a user terminal and provides a language conversion service to a user through the user terminal; and a service server which provides the language conversion service between a plurality of user applications by interworking with a plurality of user applications which are interlocked with each other for video communication, obtains voice information provided by any one user application in order to provide the language conversion service, converts the obtained voice information into text information, and provides at least one of the converted text information or sign language motion information corresponding to the converted text information to another user application.

Description

A plug-in-based video communication platform system and a method for providing video communication using the same

본 발명은 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템 및 그를 이용한 화상 커뮤니케이션 제공 방법에 관한 것이다.The present invention relates to a plug-in-based video communication platform system and a method for providing video communication using the same.

인터넷 기술의 발전과 팬더믹 환경이 심화됨에 따라 다양한 화상 커뮤니케이션 기술들이 개발되고 있다.With the development of Internet technology and the deepening of the pandemic environment, various video communication technologies are being developed.

그러나, 종래의 화상 커뮤니케이션 기술들은 단순한 화상 통화 수준에 불과하여, 장애인, 노약자 등의 소통 약자들에게나, 또는 언어가 다른 환경에서는 여전히 그 소통 및 활용이 어려운 문제가 있다. However, since the conventional video communication technologies are only at the level of a simple video call, there is a problem in that communication and utilization are still difficult for the weak in communication, such as the disabled, the elderly, or in an environment where the language is different.

이에 따라, 정각장애인, 노인 등 사회적 취약계층뿐만 아니라 전세계 모든 사용자를 대상으로 국가나 언어에 관계 없이 빠른 소통이 가능한 화상 커뮤니케이션에 대한 니즈가 발생하고 있다.Accordingly, there is a need for video communication that enables fast communication regardless of country or language for all users around the world as well as socially vulnerable groups such as the visually impaired and the elderly.

공개특허공보 제10-2015-0045335호 (공개일자: 2015년04월28일)Laid-open Patent Publication No. 10-2015-0045335 (published date: April 28, 2015)

본 발명의 일 기술적 측면은 상기한 종래 기술의 문제점을 해결하기 위한 것으로써, 청각장애인, 노인 등 사회적 취약계층뿐만 아니라 이들과 비장애인 간 전세계 모든 사용자를 대상으로 국가나 언어에 관계 없이 빠른 소통을 제공할 수 있는, 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템 및 그를 이용한 화상 커뮤니케이션 제공 방법을 제공하는 것이다.One technical aspect of the present invention is to solve the problems of the prior art, and to provide fast communication regardless of country or language for all users of the world between them and non-disabled as well as socially vulnerable groups such as the hearing impaired and the elderly. It is to provide a plug-in-based video communication platform system that can provide, and a video communication providing method using the same.

또한, 본 발명의 일 기술적 측면은, 플러그 인 방식으로 다양한 솔루션을 적용함으로써, 교육, 게임, 헬스케어 등을 콘텐츠를 남녀노소 누구나에게 편리하게 화상 커뮤니케이션을 수행할 수 있는, 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템 및 그를 이용한 화상 커뮤니케이션 제공 방법을 제공하는 것이다.In addition, one technical aspect of the present invention is a plug-in-based video communication that can conveniently perform video communication of contents such as education, games, healthcare, etc. to anyone, regardless of age or gender, by applying various solutions in a plug-in method. It is to provide a platform system and a method for providing video communication using the same.

본 발명의 상기 목적과 여러 가지 장점은 이 기술분야에 숙련된 사람들에 의해 본 발명의 바람직한 실시예로부터 더욱 명확하게 될 것이다.The above objects and various advantages of the present invention will become more apparent from preferred embodiments of the present invention by those skilled in the art.

본 발명의 일 기술적 측면은 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템을 제안한다. 상기 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템은, 사용자 단말에 설치되고, 상기 사용자 단말을 통하여 사용자에게 언어 변환 서비스를 제공하는 사용자 어플리케이션 및 화상 커뮤니케이션을 위하여 서로 연동되는 복수의 사용자 어플리케이션과 연동하여 상기 복수의 사용자 어플리케이션 간의 언어 변환 서비스를 제공하며, 상기 언어 변환 서비스를 제공하기 위하여 어느 하나의 사용자 어플리케이션에서 제공되는 음성 정보를 획득하고, 획득된 음성 정보를 텍스트 정보로 변환하고, 변환된 텍스트 정보 또는 변환된 텍스트 정보에 대응되는 수화언어 모션 정보 중 적어도 하나를 다른 사용자 어플리케이션에 제공하는 서비스 서버를 포함할 수 있다.One technical aspect of the present invention proposes a plug-in-based video communication platform system. The plug-in-based video communication platform system is installed in a user terminal and interlocks with a user application that provides a language conversion service to a user through the user terminal and a plurality of user applications that are interlocked for video communication. A language conversion service between user applications is provided, and in order to provide the language conversion service, voice information provided by any one user application is obtained, the obtained voice information is converted into text information, and converted text information or converted text information is obtained. and a service server that provides at least one of sign language motion information corresponding to text information to another user application.

일 실시예에서, 상기 사용자 어플리케이션은, 언어 변환 서비스 또는 동작 인식을 이용하는 적어도 하나의 플러그 인 솔루션을 상기 서비스 서버로부터 제공받아 상기 사용자 단말에 설치하고, 설치된 플러그 인 솔루션을 이용하여 상기 서비스 서버로부터 언어 변환 서비스 또는 동작 인식을 이용하는 화상 커뮤니케이션 서비스를 제공받을 수 있다.In an embodiment, the user application receives at least one plug-in solution using a language conversion service or gesture recognition from the service server, installs it in the user terminal, and uses the installed plug-in solution to obtain a language from the service server. A conversion service or a video communication service using motion recognition may be provided.

일 실시예에서, 상기 서비스 서버는, 상기 사용자 어플리케이션에 상기 언어 변환 서비스 또는 상기 동작 인식을 이용하는 적어도 하나의 플러그 인 솔루션을 제공하고, 상기 사용자 어플리케이션에 설치된 플러그 인 솔루션과 연동하여 상기 사용자 단말에서 생성된 동영상에 대한 상기 언어 변환 서비스 또는 상기 동작 인식을 제공 할 수 있다.In an embodiment, the service server provides at least one plug-in solution using the language conversion service or the motion recognition to the user application, and is generated in the user terminal by interworking with a plug-in solution installed in the user application It is possible to provide the language conversion service or the motion recognition for the video.

일 실시예에서, 상기 서비스 서버는, 제1 사용자 어플리케이션에서 제공되는 동영상의 음성 정보를 텍스트로 변환하고, 변환된 텍스트 정보에 대응되는 수화언어 모션 정보를 생성하며, 상기 변환된 텍스트 정보 또는 상기 수화언어 모션 정보를 상기 제1 사용자 어플리케이션과 연동중인 적어도 하나의 제2 사용자 어플리케이션에 제공하는 언어 번역 제공부 및 제1 사용자 어플리케이션에서 제공되는 제1 동영상 및 제2 동영상을 획득하고, 제1 동영상 및 제2 동영상 각각에 대하여 사람의 동작을 인식하고, 각각 인식된 동작 간의 유사도를 기반으로 일치율을 산출하는 동작 인식 제공부를 포함 할 수 있다.In an embodiment, the service server converts voice information of a video provided by the first user application into text, generates sign language motion information corresponding to the converted text information, and the converted text information or the sign language A language translation providing unit providing language motion information to at least one second user application interworking with the first user application, and acquiring the first and second moving images provided by the first user application, the first moving image and the second moving image 2 It may include a motion recognition providing unit that recognizes a human motion for each of the moving images and calculates a matching rate based on the degree of similarity between the recognized motions.

일 실시예에서, 상기 서비스 서버는, 언어 변환 서비스 또는 동작 인식을 기반으로 하는 적어도 하나의 플러그 인 솔루션을 저장하고, 상기 사용자 어플리케이션의 요청에 따라 상기 적어도 하나의 플러그 인 솔루션을 상기 사용자 어플리케이션에 제공하며, 상기 사용자 어플리케이션에 설치된 플러그 인 솔루션과 연동하여 언어 변환 서비스 또는 동작 인식을 기반으로 하는 플러그 인 솔루션과 연동하는 플러그 인 솔루션 제공부를 더 포함 할 수 있다.In an embodiment, the service server stores at least one plug-in solution based on a language conversion service or gesture recognition, and provides the at least one plug-in solution to the user application according to a request of the user application and may further include a plug-in solution providing unit that works with a plug-in solution installed in the user application to work with a language conversion service or a plug-in solution based on gesture recognition.

일 실시예에서, 상기 플러그 인 솔루션은, 상기 복수의 사용자 어플리케이션과 연동하여 음성 정보를 텍스트 정보로 변환하고, 설정에 따라 상기 텍스트 정보에 대응되는 수화언어 모션 정보를 생성하여 제공하는 에듀테크 플러그 인 솔루션, 트레이너 사용자 어플리케이션에서 제공되는 동영상에서 트레이너의 동작을 인식하고, 트레이너 사용자 어플리케이션과 연동하여 동작하는 타 사용자 어플리케이션에서 제공되는 동영상에서의 타 사용자의 동작을 인식하여, 두 영상 간의 동작의 유사도를 기반으로 일치율을 산출하여 트레이닝 화상 커뮤니케이션을 제공하는 헬스케어 플러그 인 솔루션 및 참고 동영상 스트리밍에서 댄서의 동작을 인식하고, 타 사용자 어플리케이션에서 제공되는 동영상에서의 타 사용자의 동작을 인식하여, 두 영상 간의 동작의 유사도를 기반으로 일치율을 산출하여 댄스 화상 커뮤니케이션을 제공하는 엔터테인 플러그 인 솔루션 중 적어도 하나를 포함 할 수 있다.In an embodiment, the plug-in solution converts voice information into text information in conjunction with the plurality of user applications, and generates and provides sign language motion information corresponding to the text information according to a setting according to a setting. Solution, based on the similarity of the motion between the two images by recognizing the motion of the trainer in the video provided by the solution, the trainer user application, and recognizing the motion of another user in the video provided in the video provided by the other user application that operates in conjunction with the trainer user application Recognizes the motion of a dancer in a healthcare plug-in solution that provides training video communication by calculating the matching rate with It may include at least one of entertainment plug-in solutions that provide dance video communication by calculating a matching rate based on the similarity.

일 실시예에서, 상기 언어 번역 제공부는, 상기 사용자 어플리케이션에서 설정된 언어를 확인하고, 상기 텍스트 정보를 설정된 언어에 따라 번역하는 번역 모듈, 상기 사용자 어플리케이션에서 제공되는 동영상의 음성 정보를 인식하여 텍스트로 변환하는 STT(Speech to Text) 변환모듈 및 상기 텍스트 정보에 대응되는 수화언어 모션 정보를 생성하는 TTM(Text to Motion) 변환모듈을 포함 할 수 있다.In an embodiment, the language translation providing unit may include a translation module for checking a language set in the user application and translating the text information according to a set language, and recognizes voice information of a video provided by the user application and converts it into text It may include a STT (Speech to Text) conversion module and a TTM (Text to Motion) conversion module for generating sign language motion information corresponding to the text information.

일 실시예에서, 상기 언어 번역 제공부는, 설정된 언어를 기반으로, 상기 텍스트 정보에 대응되는 기계 음성 정보를 생성하여 제공하는 TTS(Text to Speech) 변환모듈 및 상기 사용자 어플리케이션에서 제공되는 동영상에서 수화언어 모션을 인식하고, 인식된 수화언어 모션에 대응되는 텍스트 정보를 생성하여 제공하는 MTT(Motion to Text) 변환모듈 중 적어도 하나를 더 포함 할 수 있다.In one embodiment, the language translation providing unit, based on a set language, a TTS (Text to Speech) conversion module that generates and provides machine voice information corresponding to the text information and a sign language in a video provided by the user application It may further include at least one of an MTT (Motion to Text) conversion module that recognizes a motion and generates and provides text information corresponding to the recognized sign language motion.

본 발명의 다른 일 기술적 측면은 화상 커뮤니케이션 제공 방법을 제안한다. 상기 화상 커뮤니케이션 제공 방법은, 사용자 단말에 설치된 사용자 어플리케이션과 연동하여 화상 커뮤니케이션을 제공하는 서비스 서버에서 수행되는 화상 커뮤니케이션 제공 방법으로서, 화상 커뮤니케이션을 위하여 서로 연동되는 복수의 사용자 어플리케이션과 연동하여 상기 복수의 사용자 어플리케이션 간의 언어 변환 서비스를 제공하는 단계, 상기 사용자 어플리케이션에 상기 언어 변환 서비스 또는 상기 동작 인식을 이용하는 적어도 하나의 플러그 인 솔루션을 제공하는 단계 및 상기 사용자 어플리케이션에 설치된 플러그 인 솔루션과 연동하여 상기 사용자 단말에서 생성된 동영상에 대한 상기 언어 변환 서비스 또는 상기 동작 인식을 제공하는 단계를 포함 할 수 있다.Another technical aspect of the present invention proposes a method for providing video communication. The video communication providing method is a video communication providing method performed in a service server that provides video communication in conjunction with a user application installed in a user terminal, and the plurality of users by interworking with a plurality of user applications interworking with each other for video communication The steps of providing a language conversion service between applications, providing the user application with the language conversion service or at least one plug-in solution using the motion recognition, and interworking with a plug-in solution installed in the user application, in the user terminal It may include providing the language conversion service or the motion recognition for the generated video.

일 실시예에서, 상기 복수의 사용자 어플리케이션 간의 언어 변환 서비스를 제공하는 단계는, 제1 사용자 어플리케이션에서 제공되는 동영상의 음성 정보를 텍스트로 변환하는 단계, 변환된 텍스트 정보에 대응되는 수화언어 모션 정보를 생성하는 단계 및 상기 변환된 텍스트 정보 또는 상기 수화언어 모션 정보를 상기 제1 사용자 어플리케이션과 연동중인 적어도 하나의 제2 사용자 어플리케이션에 제공하는 단계를 포함 할 수 있다.In one embodiment, the step of providing the language conversion service between the plurality of user applications includes converting voice information of a video provided by the first user application into text, and receiving sign language motion information corresponding to the converted text information. It may include generating and providing the converted text information or the sign language motion information to at least one second user application interworking with the first user application.

일 실시예에서, 상기 언어 변환 서비스 또는 상기 동작 인식을 제공하는 단계는, 제1 사용자 어플리케이션에서 제공되는 제1 동영상 및 제2 동영상을 획득하는단계, 상기 제1 동영상 및 상기 제2 동영상 각각에 대하여 사람의 동작을 인식하는 단계 및 각각 인식된 동작 간의 유사도를 기반으로 일치율을 산출하여 동작 인식을 제공하는 단계를 포함 할 수 있다.In an embodiment, the step of providing the language conversion service or the motion recognition includes: obtaining a first video and a second video provided from a first user application; for each of the first video and the second video The method may include recognizing a human motion and providing motion recognition by calculating a matching rate based on a degree of similarity between each recognized motion.

상기한 과제의 해결 수단은, 본 발명의 특징을 모두 열거한 것은 아니다. 본 발명의 과제 해결을 위한 다양한 수단들은 이하의 상세한 설명의 구체적인 실시형태를 참조하여 보다 상세하게 이해될 수 있을 것이다.The means for solving the above-described problems do not enumerate all the features of the present invention. Various means for solving the problems of the present invention may be understood in more detail with reference to specific embodiments in the following detailed description.

본 발명의 일 실시형태에 따르면, 청각장애인, 노인 등 사회적 취약계층뿐만 아니라 이들과 비장애인 간 전세계 모든 사용자를 대상으로 국가나 언어에 관계 없이 빠른 소통을 제공할 수 있는 효과가 있다.According to one embodiment of the present invention, there is an effect that can provide fast communication regardless of country or language for all users around the world between these and non-disabled people as well as socially vulnerable groups such as the hearing impaired and the elderly.

또한, 본 발명의 일 실시형태에 따르면, 플러그 인 방식으로 다양한 솔루션을 적용함으로써, 교육, 게임, 헬스케어 등을 콘텐츠를 남녀노소 누구나에게 편리하게 화상 커뮤니케이션을 수행할 수 있는 효과가 있다.In addition, according to an embodiment of the present invention, by applying various solutions in a plug-in method, there is an effect that can conveniently perform video communication of contents such as education, games, health care, etc. to anyone, regardless of age or gender.

도 1은 본 발명의 실시예에 따른 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템의 구성을 예시하는 도면이다.
도 2는 도 1에 도시된 사용자 단말의 일 실시예를 설명하기 위한 블록 구성도이다.
도 3은 도 2에 도시된 사용자 어플리케이션의 일 실시예를 설명하기 위한 블록 구성도이다.
도 4는 본 발명의 일 실시예에 따른 서비스 서버의 예시적인 컴퓨팅 운영 환경을 설명하는 도면이다.
도 5는 본 발명에 따른 서비스 서버의 일 실시예를 설명하기 위한 블록 구성도이다.
도 6은 도 5에 도시된 언어변환 제공부의 일 실시예를 설명하기 위한 블록 구성도이다.
도 7은 도 5에 도시된 동작 인식 제공부의 일 실시예를 설명하기 위한 블록 구성도이다.
도 8은 본 발명의 일 실시예에 따른 동작 인식의 일 예를 설명하기 위한 도면이다.
도 9 내지 도 11은 본 발명의 일 실시예에 따른 플러그 인 솔루션의 예들을 설명하기 위한 도면이다.
도 12는 본 발명의 일 실시예에 따른 플러그 인 기반의 화상 커뮤니케이션 제공 방법을 설명하는 순서도이다. 1 is a diagram illustrating the configuration of a plug-in-based video communication platform system according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating an embodiment of the user terminal shown in FIG. 1 .
FIG. 3 is a block diagram for explaining an embodiment of the user application shown in FIG. 2 .
4 is a diagram illustrating an exemplary computing operating environment of a service server according to an embodiment of the present invention.
5 is a block diagram for explaining an embodiment of a service server according to the present invention.
6 is a block diagram illustrating an embodiment of the language conversion providing unit shown in FIG. 5 .
FIG. 7 is a block diagram illustrating an embodiment of the gesture recognition providing unit shown in FIG. 5 .
8 is a diagram for explaining an example of gesture recognition according to an embodiment of the present invention.
9 to 11 are diagrams for explaining examples of plug-in solutions according to an embodiment of the present invention.
12 is a flowchart illustrating a plug-in-based video communication providing method according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시 형태들을 설명한다. Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

그러나, 본 발명의 실시형태는 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 이하 설명하는 실시 형태로 한정되는 것은 아니다. 또한, 본 발명의 실시형태는 당해 기술분야에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위해서 제공되는 것이다. However, the embodiment of the present invention may be modified in various other forms, and the scope of the present invention is not limited to the embodiments described below. In addition, the embodiments of the present invention are provided in order to more completely explain the present invention to those of ordinary skill in the art.

즉, 전술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되며, 이에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 상세한 설명을 생략한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하기로 한다. 도면에서 동일한 참조부호는 동일 또는 유사한 구성요소를 가리키는 것으로 사용된다.That is, the above-described objects, features and advantages will be described later in detail with reference to the accompanying drawings, and accordingly, a person of ordinary skill in the art to which the present invention pertains will be able to easily implement the technical idea of the present invention. In describing the present invention, if it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to indicate the same or similar components.

또한, 본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.Also, as used herein, the singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “consisting of” or “comprising” should not be construed as necessarily including all of the various components or various steps described in the specification, some of which components or some steps are It should be construed that it may not include, or may further include additional components or steps.

또한, 이하에서 본 발명에 따른 시스템을 설명하기 위하여 다양한 구성요소 및 그의 하부 구성요소에 대하여 설명하고 있다. 이러한 구성요소 및 그의 하부 구성요소들은, 하드웨어, 소프트웨어 또는 이들의 조합 등 다양한 형태로서 구현될 수 있다. 예컨대, 각 요소들은 해당 기능을 수행하기 위한 전자적 구성으로 구현되거나, 또는 전자적 시스템에서 구동 가능한 소프트웨어 자체이거나 그러한 소프트웨어의 일 기능적인 요소로 구현될 수 있다. 또는, 전자적 구성과 그에 대응되는 구동 소프트웨어로 구현될 수 있다.In addition, various components and sub-components thereof are described below in order to describe the system according to the present invention. These components and sub-components thereof may be implemented in various forms, such as hardware, software, or a combination thereof. For example, each element may be implemented as an electronic configuration for performing a corresponding function, or may be software itself operable in an electronic system or implemented as a functional element of such software. Alternatively, it may be implemented with an electronic configuration and corresponding driving software.

본 명세서에 설명된 다양한 기법은 하드웨어 또는 소프트웨어와 함께 구현되거나, 적합한 경우에 이들 모두의 조합과 함께 구현될 수 있다. 본 명세서에 사용된 바와 같은 "부(Unit)", "서버(Server)" 및 "시스템(System)" 등의 용어는 마찬가지로 컴퓨터 관련 엔티티(Entity), 즉 하드웨어, 하드웨어 및 소프트웨어의 조합, 소프트웨어 또는 실행 시의 소프트웨어와 등가로 취급할 수 있다. 또한, 본 발명의 시스템에서 실행되는 각 기능은 모듈단위로 구성될 수 있고, 하나의 물리적 메모리에 기록되거나, 둘 이상의 메모리 및 기록매체 사이에 분산되어 기록될 수 있다.The various techniques described herein may be implemented with hardware or software, or a combination of both, where appropriate. As used herein, terms such as "Unit", "Server" and "System" likewise refer to computer-related entities, i.e. hardware, a combination of hardware and software, software or It can be treated as equivalent to software at the time of execution. In addition, each function executed in the system of the present invention may be configured in units of modules, and may be recorded in one physical memory, or may be recorded while being dispersed between two or more memories and recording media.

본 발명의 실시형태를 설명하기 위하여 다양한 순서도가 개시되고 있으나, 이는 각 단계의 설명의 편의를 위한 것으로, 반드시 순서도의 순서에 따라 각 단계가 수행되는 것은 아니다. 즉, 순서도에서의 각 단계는, 서로 동시에 수행되거나, 순서도에 따른 순서대로 수행되거나, 또는 순서도에서의 순서와 반대의 순서로도 수행될 수 있다. Although various flowcharts are disclosed to describe the embodiments of the present invention, this is for convenience of description of each step, and each step is not necessarily performed according to the order of the flowchart. That is, each step in the flowchart may be performed simultaneously with each other, performed in an order according to the flowchart, or may be performed in an order opposite to the order in the flowchart.

이하에서는, 본 발명에 따른, 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템 및 그를 이용한 플러그 인 기반의 화상 커뮤니케이션 제공 방법의 다양한 실시예들에 대하여 설명한다.Hereinafter, various embodiments of a plug-in-based video communication platform system and a plug-in-based video communication providing method using the same according to the present invention will be described.

도 1은 본 발명의 실시예에 따른 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템의 설명을 위한 도면이다.1 is a diagram for explaining a plug-in-based video communication platform system according to an embodiment of the present invention.

도 1을 참조하면, 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템 (이하, '화상 커뮤니케이션 플랫폼 시스템'이라 칭함)은 복수의 사용자 단말(100) 및 서비스 서버(300)를 포함한다. Referring to FIG. 1 , a plug-in-based video communication platform system (hereinafter, referred to as a 'video communication platform system') includes a plurality of user terminals 100 and a service server 300 .

사용자는 사용자 단말(100)을 소지하고 있으며, 사용자는 사용자 단말(100)을 통하여 화상 커뮤니케이션 서비스를 제공받을 수 있다. 일 예로, 사용자는 서비스 서버(300)로부터 사용자 어플리케이션을 제공받아 사용자 단말(100)에 설치하고, 설치된 사용자 어플리케이션을 실행시켜 화상 커뮤니케이션 서비스 기능을 제공받을 수 있다.The user possesses the user terminal 100 , and the user may be provided with a video communication service through the user terminal 100 . For example, a user may receive a user application from the service server 300 , install it on the user terminal 100 , and execute the installed user application to receive a video communication service function.

사용자는 어플리케이션이 실행 된 사용자 단말(100)을 이용하여, 언어 변환 서비스를 제공받거나, 또는 동작 인식 기반의 화상 커뮤니케이션 서비스를 제공받을 수 있다.The user may be provided with a language conversion service or a motion recognition-based video communication service by using the user terminal 100 in which the application is executed.

서비스 서버(300)는 사용자에게 사용자 어플리케이션을 제공하여 설치 가능하게 하고, 화상 커뮤니케이션을 위하여 서로 연동되는 복수의 사용자 어플리케이션과 연동하여 복수의 사용자 어플리케이션 간의 언어 변환 서비스를 제공할 수 있다.The service server 300 may provide a user application to a user to enable installation, and may provide a language conversion service between a plurality of user applications by interworking with a plurality of user applications that are interlocked for video communication.

서비스 서버(300)는 언어 변환 서비스를 제공하기 위하여 어느 하나의 사용자 어플리케이션에서 제공되는 음성 정보를 획득하고, 획득된 음성 정보를 텍스트 정보로 변환하고, 변환된 텍스트 정보 또는 변환된 텍스트 정보에 대응되는 수화언어 모션 정보 중 적어도 하나를 다른 사용자 어플리케이션에 제공할 수 있다.The service server 300 obtains voice information provided by any one user application in order to provide a language conversion service, converts the acquired voice information into text information, and corresponds to the converted text information or the converted text information. At least one of sign language motion information may be provided to another user application.

서비스 서버(300)는 플러그 인 솔루션을 이용한 서비스를 제공할 수 있다. 즉, 서비스 서버(300)는 사용자 어플리케이션에 언어 변환 서비스 또는 동작 인식을 이용하는 적어도 하나의 플러그 인 솔루션을 제공하고, 사용자 어플리케이션에 설치된 플러그 인 솔루션과 연동하여 사용자 단말(100)에서 생성된 동영상에 대한 언어 변환 서비스 또는 동작 인식을 제공할 수 있다.The service server 300 may provide a service using a plug-in solution. That is, the service server 300 provides at least one plug-in solution using a language conversion service or gesture recognition to a user application, and interworks with a plug-in solution installed in the user application for a video generated in the user terminal 100. It may provide a language conversion service or gesture recognition.

이하, 도 2 내지 도 10를 참조하여, 본 발명에 따른 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템의 각 구성요소들에 대하여 보다 상세히 설명한다.Hereinafter, each component of the plug-in-based video communication platform system according to the present invention will be described in more detail with reference to FIGS. 2 to 10 .

도 2는 도 1에 도시된 사용자 단말의 일 실시예를 설명하기 위한 블록 구성도이다.FIG. 2 is a block diagram illustrating an embodiment of the user terminal shown in FIG. 1 .

도 2를 참조하면, 사용자 단말(100)은 사용자가 휴대하는 휴대용 단말로서, 다양한 종류의 단말이 적용 가능하다. 예컨대, 사용자 단말(100)은 스마트폰(smartphone), 스마트 패드(smartpad), 타블렛 PC(Tablet PC), PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말 등 사용자가 휴대 가능하며 무선 통신이 가능한 휴대용 컴퓨팅 장치를 포괄한다.Referring to FIG. 2 , the user terminal 100 is a portable terminal carried by a user, and various types of terminals are applicable. For example, the user terminal 100 is a smart phone (smartphone), a smart pad (smartpad), a tablet PC (Tablet PC), PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) ) includes a portable computing device that is portable by a user, such as a terminal, and capable of wireless communication.

사용자 단말(100)은 통신부(110), 입출력부(120), 메모리(130), 제어부(140) 및 사용자 어플리케이션(150)를 포함할 수 있다. The user terminal 100 may include a communication unit 110 , an input/output unit 120 , a memory 130 , a control unit 140 , and a user application 150 .

통신부(110)는 무선 네트워크와 연결을 수행할 수 있으며, 서비스 서버(300)와의 통신 채널을 형성할 수 있다. 입출력부(120)는 사용자에게 입력 수단 및 출력 수단을 제공하며, 예를 들어, 스마트폰의 경우 입출력부는 터치 스크린이 될 수 있다. The communication unit 110 may connect to a wireless network and establish a communication channel with the service server 300 . The input/output unit 120 provides an input means and an output means to the user. For example, in the case of a smart phone, the input/output unit may be a touch screen.

메모리(220)는 사용자 단말(100)이 동작하는데 필요한 데이터와 프로그램 등을 저장한다. 예컨대, 이러한 메모리부(220)는 플래시 메모리 타입(Flash Memory Type), 하드 디스크 타입(Hard Disk Type), 멀티미디어 카드 마이크로 타입(Multimedia Card Micro Type), 카드 타입의 메모리(예를 들면, SD 또는 XD 메모리 등), 자기 메모리, 자기 디스크, 광디스크, 램(Random Access Memory: RAM), SRAM(Static Random Access Memory), 롬(Read-Only Memory: ROM), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory) 등의 저장매체를 포함할 수 있다. 또한, 메모리(220)는 인터넷(internet)상에서 저장 기능을 수행하는 웹 스토리지(web storage) 형태를 포함할 수 있다.The memory 220 stores data and programs necessary for the user terminal 100 to operate. For example, the memory unit 220 may include a flash memory type, a hard disk type, a multimedia card micro type, and a card type memory (eg, SD or XD). memory, etc.), magnetic memory, magnetic disk, optical disk, RAM (Random Access Memory: RAM), SRAM (Static Random Access Memory), ROM (Read-Only Memory: ROM), EEPROM (Electrically Erasable Programmable Read-Only Memory), It may include a storage medium such as a programmable read-only memory (PROM). In addition, the memory 220 may include a form of web storage that performs a storage function on the Internet.

제어부(140)는 사용자 단말(100)의 타 구성요소들을 제어한다. 예컨대, 제어부(140)는 사용자 단말(100)의 프로세싱 유닛일 수 있다.The controller 140 controls other components of the user terminal 100 . For example, the controller 140 may be a processing unit of the user terminal 100 .

사용자 어플리케이션(150)은 사용자 단말(100)에서 구동되는 소프트웨어 어플리케이션이다. 즉, 메모리(330)와 제어부(340)에 의하여 구동되는 소프트웨어 기능일 수 있다. The user application 150 is a software application running in the user terminal 100 . That is, it may be a software function driven by the memory 330 and the controller 340 .

사용자 어플리케이션(150)은 서비스 서버(300)와 연동하여 사용자에게 하는 화상 커뮤니케이션 서비스를 제공할 수 있다. 즉, 사용자 어플리케이션(150)은 제어부(140)와 연동하여 통신부(110) 및 입출력부(120)에 대한 적어도 일부 권한을 획득하여 화상 커뮤니케이션 서비스를 제공할 수 있다. The user application 150 may provide a video communication service to the user in conjunction with the service server 300 . That is, the user application 150 may provide a video communication service by interworking with the controller 140 to obtain at least some rights to the communication unit 110 and the input/output unit 120 .

사용자 어플리케이션(150)은, 사용자 단말을 통하여 사용자에게 언어 변환 서비스를 제공할 수 있다.The user application 150 may provide a language conversion service to the user through the user terminal.

사용자 어플리케이션(150)은, 언어 변환 서비스 또는 동작 인식을 이용하는 적어도 하나의 플러그 인 솔루션을 서비스 서버(300)로부터 제공받아 사용자 단말에 설치하고, 설치된 플러그 인 솔루션을 이용하여 서비스 서버(300)로부터 언어 변환 서비스 또는 동작 인식을 이용하는 화상 커뮤니케이션 서비스를 제공받을 수 있다.The user application 150 receives at least one plug-in solution using a language conversion service or gesture recognition from the service server 300 and installs it in the user terminal, and uses the installed plug-in solution to obtain a language from the service server 300 A conversion service or a video communication service using motion recognition may be provided.

이러한 사용자 어플리케이션(150)은 다양한 프로그램 타입으로 구현 가능하다. 예컨대, 컴퓨팅 디바이스에서 실행되는 소프트웨어로 구현될 수도 있고, 또는 서비스 서버가 제공하는 웹 서비스와 연동하는 웹 브라우저 기반으로 구현 될 수 도 있다. The user application 150 can be implemented in various program types. For example, it may be implemented as software running on a computing device, or it may be implemented based on a web browser interworking with a web service provided by a service server.

도 3은 도 2에 도시된 사용자 어플리케이션의 일 실시예를 설명하기 위한 블록 구성도이다.FIG. 3 is a block diagram for explaining an embodiment of the user application shown in FIG. 2 .

도 3을 참조하면, 사용자 어플리케이션(150)은 어플리케이션 플랫폼(151) 및 적어도 하나의 플러그 인 솔루션(152 내지 154)를 포함할 수 있다.Referring to FIG. 3 , the user application 150 may include an application platform 151 and at least one plug-in solution 152 to 154 .

어플리케이션 플랫폼(151)은 서비스 서버(300)에서 제공하는 복수의 플러그 인 솔루션에 대한 리스트를 사용자에게 제공하고, 사용자의 선택에 따라 플러그 인 솔루션의 커스터마이징 설치를 제공한다.The application platform 151 provides a list of a plurality of plug-in solutions provided by the service server 300 to the user, and provides customized installation of the plug-in solution according to the user's selection.

즉, 어플리케이션 플랫폼(151)은 메인 플랫폼으로서 기능하고, 사용자는 이러한 어플리케이션 플랫폼(151)을 이용하여 간편하게 플러그 인 솔루션을 설치, 삭제 할 수 있다. 따라서, 사용자는 니즈에 맞게 다양한 플러그 인 솔루션의 간편 설치가 가능하여 사용자 어플리케이션의 커스터마이징이 가능하다.That is, the application platform 151 functions as a main platform, and a user can easily install and delete a plug-in solution by using the application platform 151 . Accordingly, users can easily install various plug-in solutions to suit their needs, enabling customization of user applications.

플러그 인 솔루션의 일 예로서, 언어 변환 서비스를 제공하는 제1 플러그 인 또는 동작 인식을 제공하는 제2 플러그 인 솔루션을 포함하며, 플러그 인 솔루션이 설치되면 플러그 인 솔루션은 서비스 서버와 연동하여 해당 플러그 인 솔루션의 기능, 예컨대, 언어 변환 서비스 또는 동작 인식을 제공할 수 있다.As an example of the plug-in solution, it includes a first plug-in that provides a language conversion service or a second plug-in solution that provides gesture recognition, and when the plug-in solution is installed, the plug-in solution works in conjunction with the service server It may provide the functionality of the in-solution, such as a language conversion service or gesture recognition.

사용자 어플리케이션(150)은 서비스 서버(300)와 연동하여 동작하므로, 이에 대해서는 이하의 서비스 서버(300)의 설명을 참조하여 보다 쉽게 이해할 수 있다.Since the user application 150 operates in conjunction with the service server 300 , this can be more easily understood with reference to the following description of the service server 300 .

이하, 도 4 내지 도 11을 참조하여 서비스 서버에 대하여 보다 상세히 설명한다.Hereinafter, the service server will be described in more detail with reference to FIGS. 4 to 11 .

도 4는 본 발명의 일 실시예에 따른 서비스 서버의 예시적인 컴퓨팅 운영 환경을 설명하는 도면이다.4 is a diagram illustrating an exemplary computing operating environment of a service server according to an embodiment of the present invention.

도 4는 서비스 서버(300)의 실시예들이 구현될 수 있는 적합한 컴퓨팅 환경의 일반적이고 단순화된 설명을 제공하기 위한 것으로, 도 4을 참조하면, 서비스 서버(300)의 일 예로서 컴퓨팅 장치가 도시된다. 4 is intended to provide a general and simplified description of a suitable computing environment in which embodiments of the service server 300 may be implemented. Referring to FIG. 4 , a computing device is illustrated as an example of the service server 300 . do.

컴퓨팅 장치는 적어도 프로세싱 유닛(303)과 시스템 메모리(301)를 포함할 수 있다. The computing device may include at least a processing unit 303 and a system memory 301 .

컴퓨팅 장치는 프로그램을 실행할 때 협조하는 복수의 프로세싱 유닛을 포함할 수도 있다. 컴퓨팅 장치의 정확한 구성 및 유형에 의존하여, 시스템 메모리(301)는 휘발성(예컨대, 램(RAM)), 비휘발성(예컨대, 롬(ROM), 플래시 메모리 등) 또는 이들의 조합일 수 있다. 시스템 메모리(301)는 플랫폼의 동작을 제어하기 위한 적합한 운영 체제(302)를 포함하는데, 예컨대 마이크로소프트사로부터의 WINDOWS 운영체제와 같은 것일 수 있다. 시스템 메모리(301)는 프로그램 모듈, 애플리케이션 등의 같은 하나 이상의 소프트웨어 애플리케이션을 포함할 수도 있다. A computing device may include a plurality of processing units that cooperate in executing a program. Depending on the exact configuration and type of computing device, system memory 301 may be volatile (eg, RAM), non-volatile (eg, ROM, flash memory, etc.), or a combination thereof. The system memory 301 includes a suitable operating system 302 for controlling the operation of the platform, such as the WINDOWS operating system from Microsoft Corporation. System memory 301 may include one or more software applications, such as program modules, applications, and the like.

컴퓨팅 장치는 자기 디스크, 광학적 디스크, 또는 테이프와 같은 추가적인 데이터 저장 장치(304)를 포함할 수 있다. 이러한 추가적 저장소는 이동식 저장소 및/또는 고정식 저장소 일 수 있다. 컴퓨터 판독가능 저장 매체는 컴퓨터 판독가능 인스트럭션, 데이터 구조, 프로그램 모듈, 또는 기타 데이터와 같은 저장정보를 위한 임의의 방법이나 기법으로 구현되는 휘발성 및 비휘발성, 이동식 및 고정식 매체를 포함할 수 있다. 시스템 메모리(301), 저장소(304)는 모두 컴퓨터 판독가능 저장 매체의 예시일 뿐이다. 컴퓨터 판독가능 저장 매체는 램(RAM), 롬(ROM), EEPROM, 플래시 메모리 또는 다른 메모리 기법, CD-ROM, DVD 또는 다른 광학적 저장소, 자기 테이프, 자기 디스크 저장소 또는 다른 자기적 저장 장치, 또는 원하는 정보를 저장하고 컴퓨팅 장치(300)에 의해 액세스될 수 있는 임의의 다른 매체를 포함할 수 있는데, 이에 한정되는 것은 아니다. The computing device may include additional data storage devices 304 such as magnetic disks, optical disks, or tape. Such additional storage may be removable storage and/or fixed storage. Computer readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for stored information such as computer readable instructions, data structures, program modules, or other data. The system memory 301 and the storage 304 are only examples of computer-readable storage media. A computer readable storage medium may include RAM (RAM), ROM (ROM), EEPROM, flash memory or other memory technology, CD-ROM, DVD or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage device, or any desired It may include, but is not limited to, any other medium that stores information and that can be accessed by computing device 300 .

컴퓨팅 장치의 입력 장치(305), 예컨대 키보드, 마우스, 펜, 음성 입력 장치, 터치 입력 장치, 및 비교 가능한 입력 장치를 포함할 수 있다. 출력 장치(306)는, 예컨대 디스플레이, 스피커, 프린터, 및 다른 유형의 출력 장치가 포함될 수도 있다. 이들 장치는 본 기술분야에서 널리 알려진 것이므로 자세한 설명은 생략한다.input devices 305 of computing devices, such as keyboards, mice, pens, voice input devices, touch input devices, and comparable input devices. Output devices 306 may include, for example, displays, speakers, printers, and other types of output devices. Since these devices are widely known in the art, detailed descriptions thereof will be omitted.

컴퓨팅 장치는 예컨대 분산 컴퓨팅 환경에서의 네트워크, 예컨대, 유무선 네트워크, 위성 링크, 셀룰러 링크, 근거리 네트워크, 및 비교가능한 메커니즘을 통해 장치가 다른 장치들과 통신하도록 허용하는 통신 장치(307)를 포함할 수도 있다. 통신 장치(307)는 통신 매체의 한가지 예시이며, 통신 매체는 그 안에 컴퓨터 판독 가능 인스트럭션, 데이터 구조, 프로그램 모듈, 또는 다른 데이터를 포함할 수 있다. 예시적으로, 통신 매체는 유선 네트워크나 직접 유선 접속과 같은 유선 매체, 및 음향, RF, 적외선 및 다른 무선 매체와 같은 무선 매체를 포함하는데, 이에 한정되는 것은 아니다. Computing devices may include communication devices 307 that allow the device to communicate with other devices over networks, such as wired and wireless networks, satellite links, cellular links, local area networks, and comparable mechanisms, such as in distributed computing environments, for example. there is. Communication device 307 is one example of a communication medium, which may include computer readable instructions, data structures, program modules, or other data therein. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media.

서비스 서버(300)는 이러한 컴퓨팅 환경에서 구현되는 기능적 구성-예컨대, 소프트웨어 모듈이나 인스트럭션의 집합 등-으로 설명될 수 있다. 이에 대해서는, 도 5 내지 도 11을 참조하여 서비스 서버의 다양한 실시예에 대하여 보다 상세히 설명한다.The service server 300 may be described as a functional configuration (eg, a software module or a set of instructions) implemented in such a computing environment. In this regard, various embodiments of the service server will be described in more detail with reference to FIGS. 5 to 11 .

도 5는 본 발명에 따른 서비스 서버의 일 실시예를 설명하기 위한 블록 구성도이다.5 is a block diagram for explaining an embodiment of a service server according to the present invention.

도 5를 참조하면, 서비스 서버(300)는 메인 플랫폼(310), 언어 번역 제공부(320), 동작 인식 제공부(330) 및 플러그 인 솔루션 제공부(340)를 포함할 수 있다.Referring to FIG. 5 , the service server 300 may include a main platform 310 , a language translation providing unit 320 , a gesture recognition providing unit 330 , and a plug-in solution providing unit 340 .

메인 플랫폼(310)은 화상 커뮤니케이션 서비스를 제공하기 위하여 사용자 어플리케이션과 연동한다. 예컨데, 메인 플랫폼(310)은 사용자 어플리케이션의 제공, 사용자 가입 처리, 사용자 로그인 인증, 사용자 어플리케이션과의 통신 세션 연결, 사용자 어플리케이션에서 제공되는 정보의 획득 및 획득된 정보를 서비스 서버(300)의 타 구성요소에 제공하고, 그에 대한 처리된 데이터를 사용자 어플리케이션에 제공하는 등, 사용자 어플리케이션과의 연동을 위한 기능을 제공할 수 있다. The main platform 310 interworks with a user application to provide a video communication service. For example, the main platform 310 provides a user application, user subscription processing, user login authentication, communication session connection with a user application, acquisition of information provided by the user application, and other components of the service server 300 . It is possible to provide a function for interworking with the user application, such as providing the element to the element and providing the processed data thereto to the user application.

언어 번역 제공부(320)는 사용자 어플리케이션에서 제공되는 데이터에 대한 언어 번역 기능을 제공할 수 있다.The language translation providing unit 320 may provide a language translation function for data provided by a user application.

언어 번역 제공부(320)는 설정된 국가 또는 언어에 따라, 사용자 어플리케이션 간의 번역 기능을 제공할 수 있다.The language translation providing unit 320 may provide a translation function between user applications according to a set country or language.

언어 번역 제공부(320)는 수화언어 기능을 제공할 수 있다. 예컨대, 언어 번역 제공부(320)는 제1 사용자 어플리케이션에서 제공되는 동영상의 음성 정보를 텍스트로 변환하고, 변환된 텍스트 정보에 대응되는 수화언어 모션 정보를 생성할 수 있다. 언어 번역 제공부(320)는 변환된 텍스트 정보 또는 수화언어 모션 정보를 제1 사용자 어플리케이션과 연동중인 적어도 하나의 제2 사용자 어플리케이션에 제공할 수 있다.The language translation providing unit 320 may provide a sign language function. For example, the language translation providing unit 320 may convert voice information of a video provided by the first user application into text and generate sign language motion information corresponding to the converted text information. The language translation providing unit 320 may provide the converted text information or sign language motion information to at least one second user application interworking with the first user application.

동작 인식 제공부(330)는 사용자 어플리케이션에서 구동되는 동영상-예컨대, 사용자 어플리케이션에서 촬상되어 생성된 동영상 또는 스트리밍 중인 동영상 등-을 대상으로 동작 인식 기능을 제공할 수 있다.The gesture recognition providing unit 330 may provide a gesture recognition function to a moving picture driven by a user application, for example, a moving image captured by the user application, a moving image, or the like.

동작 인식 제공부(330)는 제1 사용자 어플리케이션에서 제공되는 제1 동영상 및 제2 동영상을 획득하고, 제1 동영상 및 제2 동영상 각각에 대하여 사람의 동작을 인식하고, 각각 인식된 동작 간의 유사도를 기반으로 일치율을 산출하여 제공할 수 있다.The motion recognition providing unit 330 acquires the first video and the second video provided by the first user application, recognizes a human motion with respect to each of the first video and the second video, and determines the similarity between the recognized motions. Based on this, the matching rate can be calculated and provided.

플러그 인 솔루션 제공부(340)는 사용자 어플리케이션에 플러그 인 솔루션을 제공할 수 있다. The plug-in solution providing unit 340 may provide a plug-in solution to a user application.

플러그 인 솔루션 제공부(340)는 언어 변환 서비스 또는 동작 인식을 기반으로 하는 적어도 하나의 플러그 인 솔루션을 저장한다. 플러그 인 솔루션 제공부(340)는 사용자 어플리케이션의 요청에 따라 적어도 하나의 플러그 인 솔루션을 사용자 어플리케이션에 제공할 수 있다. 플러그 인 솔루션 제공부(340)는 사용자 어플리케이션에 설치된 플러그 인 솔루션과 연동하여 언어 변환 서비스 또는 동작 인식을 기반으로 하는 플러그 인 솔루션과 연동하여 동작할 수 있다.The plug-in solution providing unit 340 stores at least one plug-in solution based on a language conversion service or gesture recognition. The plug-in solution providing unit 340 may provide at least one plug-in solution to the user application according to the request of the user application. The plug-in solution providing unit 340 may operate in conjunction with a language conversion service or a plug-in solution based on gesture recognition in conjunction with a plug-in solution installed in a user application.

플러그 인 솔루션 제공부(340)는 각각의 플러그 인 솔루션과 연동하기 위한 개별적인 플러그 인 연동부(351 내지 35n)을 포함할 수 있다.The plug-in solution providing unit 340 may include individual plug-in interlocking units 351 to 35n for interworking with each plug-in solution.

예를 들어, 플러그 인 솔루션으로서 후술하는 에듀테크 플러그 인 솔루션, 헬스케어 플러그 인 솔루션, 엔터테인 플러그 인 솔루션 등이 있을 수 있다. 플러그 인 연동부(351 내지 35n)는 각각 플러그 인 솔루션 마다 연동부가 구성되도록 구성될 수 있다. 즉, 에듀테크 플러그 인 연동부는 에듀테크 플러그 인 솔루션을 구동하는 사용자 단말 간의 연동을, 헬스케어 플러그 인 연동부는 헬스케어 플러그 인 솔루션을 구동하는 사용자 단말 간의 연동을, 엔터테인 플러그 인 연동부는 엔터테인 플러그 인 솔루션을 구동하는 사용자 단말 간의 연동을 제공할 수 있다. For example, as a plug-in solution, there may be an Edutech plug-in solution, a healthcare plug-in solution, an entertainment plug-in solution, etc. to be described later. The plug-in interlocking units 351 to 35n may be configured such that the interlocking unit is configured for each plug-in solution, respectively. That is, the Edutech plug-in interworking unit interlocks between user terminals driving the Edutech plug-in solution, the healthcare plug-in interlocks interlocks between user terminals driving the healthcare plug-in solution, and the entertainment plug-in interlocks the entertainment plug-in. Interworking between user terminals running the solution can be provided.

플러그 인 연동부(351 내지 35n)는 언어 번역 제공부(320) 내지 동작 인식 제공부(330)에 언어 번역 내지 동작 인식을 요청하고, 그에 대한 처리 결과를 사용자 어플리케이션에 제공하는 통로 역할을 수행할 수 있다.The plug-in interworking units 351 to 35n may request a language translation or gesture recognition from the language translation providing unit 320 or the gesture recognition providing unit 330, and serve as a channel for providing the processing result to the user application. can

플러그 인 연동부(351 내지 35n)는 플러그 인 솔루션에서 요구되는 기타 정보, 예컨대, 스트리밍 동영상의 스트리밍 정보, 트레이너가 제공하는 동작명 등의 정보를 제공할 수 있다. The plug-in interworking units 351 to 35n may provide other information required by the plug-in solution, for example, streaming information of a streaming video, information such as an operation name provided by a trainer.

이러한 플러그 인 솔루션은 사용자 어플리케이션에 기본적인 설치 구성에는 속하지 않으나 사용자의 요청에 의하여 설치 가능한 프로그램 모듈이며, 서비스 서버에서 제공하는 언어 변환 서비스 또는 동작 인식을 기반으로 개발되는 다양한 솔루션이다. 이러한, 플러그 인 솔루션은 서비스 서버(300)의 운영 주체가 아닌 별도의 제3자 개발주체에 의하여 개발되어 등록 가능하며, 이러한 플러그 인 솔루션을 통하여 사용자는 다양한 솔루션을 제공받을 수 있고, 이를 개발한 제3자 개발 주체는 솔루션의 판매 또는 운영을 통한 수익을 발생시킬 수 있다.These plug-in solutions do not belong to the basic installation configuration of the user application, but are program modules that can be installed upon user's request, and are various solutions developed based on the language conversion service or motion recognition provided by the service server. Such a plug-in solution can be developed and registered by a separate third-party development entity rather than the operating entity of the service server 300, and users can receive various solutions through this plug-in solution, and Third-party development entities may generate revenue through the sale or operation of the solution.

도 9 내지 도 11은 본 발명의 일 실시예에 따른 플러그 인 솔루션의 예들을 설명하기 위한 도면으로 도 9 내지 도 11을 참조하여, 플러그 인 솔루션들의 예에 대하여 설명한다.9 to 11 are diagrams for explaining examples of plug-in solutions according to an embodiment of the present invention. Examples of plug-in solutions will be described with reference to FIGS. 9 to 11 .

도 9은 자막, 번역 및 수화 기능을 제공하기 위한 에듀테크 플러그 인 솔루션의 일 예를 도시한다.9 shows an example of an Edutech plug-in solution for providing subtitle, translation, and sign language functions.

도 9에 도시된 에듀테크 플러그 인 솔루션은, 복수의 사용자 어플리케이션과 연동하여 음성 정보를 텍스트 정보로 변환하여 실시간 자막을 제공할 수 있다. 또한, 수어 설정에 따라 텍스트 정보에 대응되는 수화언어 모션 정보(수어)를 생성하여 제공할 수 있다. 에듀테크 플러그 인 솔루션은, 언어 번역 설정을 통하여 기계 번역 기능을 제공할 수 있다. The Edutech plug-in solution shown in FIG. 9 can provide real-time captions by converting voice information into text information in conjunction with a plurality of user applications. In addition, according to the sign language setting, sign language motion information (sign language) corresponding to the text information may be generated and provided. Edutech plug-in solution can provide machine translation function through language translation setting.

도 10는 트레이너와 사용자 간의 헬스 케어를 제공하는 헬스케어 플러그 인 솔루션의 일 예를 도시한다.10 shows an example of a healthcare plug-in solution that provides healthcare between a trainer and a user.

도 10에 도시된 헬스케어 플러그 인 솔루션은, 트레이너 사용자 어플리케이션에서 제공되는 동영상에서 트레이너의 동작을 인식하고, 트레이너 사용자 어플리케이션과 연동하여 동작하는 타 사용자 어플리케이션에서 제공되는 동영상에서의 타 사용자의 동작을 인식하여, 두 영상 간의 동작의 유사도를 기반으로 일치율을 산출하여 트레이닝 화상 커뮤니케이션을 제공할 수 있다.The healthcare plug-in solution shown in FIG. 10 recognizes the motion of a trainer in a video provided by the trainer user application, and recognizes the motion of another user in a video provided by another user application that operates in conjunction with the trainer user application Thus, it is possible to provide training video communication by calculating a matching rate based on the similarity of motions between the two images.

헬스케어 플러그 인 솔루션은, 트레이너가 제공하는 동작명을 제공하며, 트레이너 동작과 사용자의 동작 간의, 자세 일치 정도, 타이밍 일치 정도 및 밸런스 일치 정도를 산출하여 제공할 수 있다.The healthcare plug-in solution may provide a motion name provided by a trainer, and may calculate and provide a degree of posture matching, a timing matching, and a balance matching between the trainer's motion and the user's motion.

도 11은 동영상 스트리밍 기반의 엔터테인 플러그 인 솔루션의 일 예를 도시한다.11 shows an example of an entertainment plug-in solution based on video streaming.

엔터테인 플러그 인 솔루션은, 참고 동영상 스트리밍(유튜브 등)에서 댄서의 동작을 인식하고, 타 사용자 어플리케이션에서 제공되는 동영상에서의 타 사용자의 동작을 인식하여, 두 영상 간의 동작의 유사도를 기반으로 일치율을 산출하여 댄스 화상 커뮤니케이션을 제공할 수 있다.The entertainment plug-in solution recognizes the motion of a dancer in the reference video streaming (YouTube, etc.) to provide dance video communication.

엔터테인 플러그 인 솔루션은, 스트리밍 동영상의 스트리밍 정보-예컨대, 가수, 곡명 등-을 제공하며, 댄서 동작과 사용자의 동작 간의, 자세 일치 정도, 타이밍 일치 정도 및 밸런스 일치 정도를 산출하여 제공할 수 있다.The entertainment plug-in solution provides streaming information (eg, singer, song name, etc.) of a streaming video, and may calculate and provide the degree of matching posture, timing, and balance between the dancer's motion and the user's motion.

이하, 도 6 내지 도 8을 참조하여, 서비스 서버(300)의 언어 번역 및 동작 인식 기능에 대한 다양한 실시예들에 대하여 설명한다.Hereinafter, various embodiments of the language translation and gesture recognition functions of the service server 300 will be described with reference to FIGS. 6 to 8 .

도 6은 도 5에 도시된 언어변환 제공부의 일 실시예를 설명하기 위한 블록 구성도이다.6 is a block diagram illustrating an embodiment of the language conversion providing unit shown in FIG. 5 .

도 6를 참조하면, 언어변환 제공부(320)는 STT(Speech to Text) 변환 모듈(510), TTM(Text to Motion) 변환모듈(520), 번역 모듈(530), TTS(Text to Speech) 변환모듈(540) 및 MTT(Motion to Text) 변환모듈(550) 중 적어도 하나를 포함할 수 있다.Referring to FIG. 6 , the language conversion providing unit 320 includes a speech to text (STT) conversion module 510, a text to motion (TTM) conversion module 520, a translation module 530, and a text to speech (TTS) conversion module. It may include at least one of a conversion module 540 and a motion to text (MTT) conversion module 550 .

번역 모듈(530)은 서로 다른 언어 간의 번역 기능을 제공할 수 있다. 번역 모듈(530)은 사용자 어플리케이션에서 설정된 언어를 확인하고, 텍스트 정보를 설정된 언어에 따라 번역할 수 있다. 이를 위하여, 번역 모듈(530)은 텍스트 기반의 기계번역 기능을 포함할 수 있다.The translation module 530 may provide a translation function between different languages. The translation module 530 may check a language set in the user application and translate text information according to the set language. To this end, the translation module 530 may include a text-based machine translation function.

STT(Speech to Text) 변환 모듈(510)은 음성 정보를 텍스트 정보로 변환할 수 있다. The speech to text (STT) conversion module 510 may convert voice information into text information.

STT(Speech to Text) 변환 모듈(510)은 사용자 어플리케이션에서 제공되는 동영상의 음성 정보를 인식하여 텍스트로 변환할 수 있다. 변환된 텍스트 정보는 플러그 인 솔루션에서 자막으로 사용되거나, 또는 번역 모듈(530)에 제공되어 번역 기능에 사용될 수 있다.The speech to text (STT) conversion module 510 may recognize voice information of a video provided by a user application and convert it into text. The converted text information may be used as a subtitle in a plug-in solution or may be provided to the translation module 530 and used for a translation function.

TTM(Text to Motion) 변환모듈(520)은 텍스트를 기반으로 수어를 생성할 수 있다. The text to motion (TTM) conversion module 520 may generate a sign language based on the text.

TTM(Text to Motion) 변환모듈(520)은 텍스트 정보에 대응되는 수화언어 모션 정보를 생성할 수 있다. 이를 위하여, TTM(Text to Motion) 변환모듈은 수화언어 모션에 대한 데이터베이스를 구비하여, 입력되는 텍스트에 대응되는 수화언어 모션 정보를 출력할 수 있다. 예컨대, 수화언어 모션 정보는 텍스트의 각 단어에 대응되는 수화 동영상의 연속된 제공으로 이루어질 수 있다. The text to motion (TTM) conversion module 520 may generate sign language motion information corresponding to the text information. To this end, the text to motion (TTM) conversion module may include a database for sign language motion and output sign language motion information corresponding to the input text. For example, the sign language motion information may be formed by continuously providing a sign language video corresponding to each word of the text.

TTS(Text to Speech) 변환모듈(540)은 텍스트를 음성으로 변환할 수 있다.The text to speech (TTS) conversion module 540 may convert text into speech.

TTS(Text to Speech) 변환모듈(540)은 설정된 언어(또는 국가)를 기반으로, 텍스트 정보에 대응되는 기계 음성 정보를 생성하여 제공할 수 있다. 이를 위하여, TTS(Text to Speech) 변환모듈(540)은 언어 별로, 각 텍스트에 대응되는 기계음성 정보 데이터베이스를 구비할 수 있다.The Text to Speech (TTS) conversion module 540 may generate and provide machine voice information corresponding to text information based on a set language (or country). To this end, the text to speech (TTS) conversion module 540 may include a machine voice information database corresponding to each text for each language.

MTT(Motion to Text) 변환모듈(550)은 수화를 텍스트로 변환할 수 있다.The Motion to Text (MTT) conversion module 550 may convert sign language into text.

MTT(Motion to Text) 변환모듈(550)은 사용자 어플리케이션에서 제공되는 동영상에서 수화언어 모션을 인식하고, 인식된 수화언어 모션에 대응되는 텍스트 정보를 생성하여 제공할 수 있다. 이를 위하여, MTT(Motion to Text) 변환모듈(550)은 영상에서 손 객체를 인식하고, 인식된 손 객체의 모양에 대응하여 텍스트를 출력하는 영상 인식 기능을 포함할 수 있다.The Motion to Text (MTT) conversion module 550 may recognize a sign language motion in a video provided by a user application, and may generate and provide text information corresponding to the recognized sign language motion. To this end, the Motion to Text (MTT) conversion module 550 may include an image recognition function for recognizing a hand object in an image and outputting text in response to the recognized shape of the hand object.

도 7은 도 5에 도시된 동작 인식 제공부의 일 실시예를 설명하기 위한 블록 구성도이다.FIG. 7 is a block diagram illustrating an embodiment of the gesture recognition providing unit shown in FIG. 5 .

도 7을 참조하면, 동작 인식 제공부(330)는 객체 인식 모듈(610), 동작 인식 모듈(620) 및 동작 유사율 판단모듈(630)을 포함할 수 있다. Referring to FIG. 7 , the motion recognition providing unit 330 may include an object recognition module 610 , a motion recognition module 620 , and a motion similarity determining module 630 .

객체 인식 모듈(610)은 사용자 어플리케이션에서 제공된 동영상에서 사람 객체를 인식할 수 있다. 즉, 객체 인식 모듈(610)은 동영상 내에서의 사람의 외형에 대응되는 객체를 인식할 수 있으며, 여기에서 사람의 외형은 사람의 전신을 기준으로 한다. 이를 위하여, 객체 인식 모듈(610)은 동영상에 대한 영상인식 기능을 포함하며, 다만, 영상 객체 인식 기술은 다양하게 적용 가능하므로 여기에서는 특정 기술(알고리즘)으로 한정하지 않는다.The object recognition module 610 may recognize a human object in a video provided by a user application. That is, the object recognition module 610 may recognize an object corresponding to the appearance of a person in the video, where the appearance of the person is based on the whole body of the person. To this end, the object recognition module 610 includes an image recognition function for a moving picture, however, since the image object recognition technology can be applied in various ways, it is not limited to a specific technology (algorithm) here.

동작 인식 모듈(620)은 객체 인식 모듈(610)에서 인식된 사람 객체의 자세를 추출할 수 있다. 동작 인식 모듈(620)에서 추출된 사람 객체의 자세는, 도 8에 도시된 예와 같이, 뼈대 구조로 표시될 수 있다.The gesture recognition module 620 may extract the posture of the human object recognized by the object recognition module 610 . The posture of the human object extracted by the motion recognition module 620 may be displayed as a skeletal structure as in the example shown in FIG. 8 .

일 실시예에서, 동작 인식 모듈(620)은 코어 인식 모듈(621) 및 자세 추출모듈(622)를 포함할 수 있다.In one embodiment, the motion recognition module 620 may include a core recognition module 621 and a posture extraction module 622 .

코어 인식 모듈(621)은 객체 인식 모듈(610)에서 인식된 사람 객체에서 머리-몸통을 인식하여 코어로서 인식할 수 있다. 즉, 코어 인식 모듈(621)은 사람 객체에서 머리와 몸통의 외형을 인식하고, 인식된 머리와 몸통의 외형에 대응하여 머리 뼈대와 몸통 뼈대를 생성할 수 있다. 몸통 뼈대는 도 8에 예시된 바와 같이, 어깨 뼈대, 척추 뼈대 및 골반 뼈대를 포함할 수 있다. The core recognition module 621 may recognize the head and body from the human object recognized by the object recognition module 610 and recognize it as a core. That is, the core recognition module 621 may recognize the outlines of the head and torso in the human object, and generate the head and torso skeletons in response to the recognized outlines of the head and torso. The trunk skeleton may include a shoulder skeleton, a vertebral skeleton, and a pelvic skeleton, as illustrated in FIG. 8 .

자세 추출모듈(622)은 코어 인식 모듈(621)에서 인식된 머리 뼈대와 몸통 뼈대를 기초로, 사람 객체에서 팔 다리를 식별하고 팔 다리에 대한 팔 뼈대 및 다리 뼈대를 생성할 수 있다. 도 8에 예시된 바와 같이, 팔 뼈대는 어깨 뼈대의 양 단에 하나씩 연결되고, 다리 뼈대는 골반 뼈대의 양 단에 하나씩 연결 될 수 있다.The posture extraction module 622 may identify limbs in a human object based on the head and torso bones recognized by the core recognition module 621 and generate arm and leg skeletons for the limbs. As illustrated in FIG. 8 , the arm bones may be connected one by one to both ends of the shoulder bone, and the leg bones may be connected to both ends of the pelvic bone one by one.

이와 같이, 머리와 몸통은 신체의 중심 부분에 있으며 그 외형도 다른 신체 부위와 차별적으로 구분 가능하므로, 동작 인식 모듈(620)은, 코어 인식 모듈(621)을 통하여 머리와 몸통의 뼈대를 우선적으로 생성하고, 그를 기반으로 자세 추출모듈(622)을 통하여 팔 및 다리의 뼈대를 생성하도록 하는 것이다. 이를 통하여, 전체 동작을 한번에 분석하는 방식과 달리, 코어 우선 인식 후 팔 다리를 인식함으로써, 사람의 동작 인식을 빠르게 처리함과 함께 적은 리소스로도 효율적으로 인식할 수 있다.As described above, since the head and torso are located in the central part of the body and their appearance can be differentiated from other body parts, the motion recognition module 620 preferentially identifies the skeleton of the head and torso through the core recognition module 621 . and to generate the skeleton of the arm and leg through the posture extraction module 622 based on it. Through this, unlike the method of analyzing all motions at once, by recognizing the limbs after core-priority recognition, it is possible to quickly process human motion recognition and efficiently recognize with a small amount of resources.

동작 유사율 판단모듈(630)은 동작 인식 모듈(620)에서 생성된 뼈대 구조를 상호 비교하여 동작 유사율을 판단 할 수 있다.The motion similarity determination module 630 may determine the motion similarity rate by comparing the skeleton structures generated by the motion recognition module 620 with each other.

동작 유사율 판단 모듈(630)은 서로 비교되는 제1 동영상에 대한 제1 뼈대 구조와, 제2 동영상에 대한 제2 뼈대 구조를 상호 비교하여, 동작의 자세 일치 정도를 산출할 수 있다. 즉, 동작 유사율 판단 모듈(630)은 뼈대 구조에서 코어 뼈대의 기울기 및 각 뼈대 간의 각도를 산출하여 유사도를 판단할 수 있다. 즉, 코어 뼈대의 각 개체, 즉, 머리 뼈대, 어깨 뼈대, 척추 뼈대 및 골반 뼈대의 기울기를 산출하고, 각 뼈대간의 각도 -예컨대, 머리 뼈대와 척추 뼈대의 각도, 상부팔 뼈대와 하부팔 뼈대의 각도 등-을 각각 산출하고, 이를 상호 비교하여 유사도를 산출할 수 있다. 산출된 유사도는 자세 일치 정도 및 밸런스 일치 정도로서 사용될 수 있다.The motion similarity determination module 630 may calculate a posture matching degree of motion by comparing the first skeletal structure of the first moving image and the second skeletal structure of the second moving image to be compared with each other. That is, the motion similarity determining module 630 may determine the similarity by calculating the inclination of the core skeleton and the angle between each skeleton in the skeleton structure. That is, the inclination of each individual of the core skeleton, that is, the head skeleton, the shoulder bone, the vertebral skeleton, and the pelvic skeleton is calculated, and the angle between each skeleton - for example, the angle of the head skeleton and the vertebral skeleton, the upper arm skeleton and the lower arm skeleton Angle, etc. can be calculated respectively, and the degree of similarity can be calculated by comparing them with each other. The calculated similarity may be used as the posture matching degree and the balance matching degree.

일 실시예에서, 동작 유사율 판단 모듈(630)은 타이밍 일치 정도를 추가로 산출할 수 있다. 즉, 동작 유사율 판단 모듈(630)은 유사한 자세-즉, 일정 이상의 일치율을 가지는 자세-를 만족하는 두 동영상에서, 이러한 유사한 자세가 발생하는 타이밍의 일치 정도를 산출할 수 있다. In an embodiment, the motion similarity determining module 630 may further calculate the timing coincidence. That is, the motion similarity determination module 630 may calculate the degree of coincidence of timings at which such similar postures occur in two videos that satisfy similar postures—that is, postures having a coincidence rate greater than or equal to a certain level.

이상에서는 도 1 내지 도 11을 참조하여, 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템의 다양한 실시예들에 대하여 설명하였다. In the above, various embodiments of a plug-in-based video communication platform system have been described with reference to FIGS. 1 to 11 .

이하에서는, 본 발명의 일 실시예에 따른, 플러그 인 기반의 화상 커뮤니케이션 제공 방법에 대하여 설명한다. Hereinafter, a plug-in-based video communication providing method according to an embodiment of the present invention will be described.

이하에서 설명할 플러그 인 기반의 화상 커뮤니케이션 제공 방법은, 도 1 내지 도 11을 참조하여 설명한 플러그 인 기반의 화상 커뮤니케이션 플랫폼 시스템을 기초로 수행되므로, 도 1 내지 도 11에서 기 설명한 내용을 참고하여 보다 쉽게 이해할 수 있다.Since the plug-in-based video communication providing method to be described below is performed based on the plug-in-based video communication platform system described with reference to FIGS. 1 to 11, more Easy to understand.

도 12는 본 발명의 일 실시예에 따른 플러그 인 기반의 화상 커뮤니케이션 제공 방법을 설명하는 순서도이다. 12 is a flowchart illustrating a plug-in-based video communication providing method according to an embodiment of the present invention.

도 12을 참조하면, 서비스 서버(300)는, 사용자 어플리케이션에 언어 변환 서비스 또는 동작 인식을 이용하는 적어도 하나의 플러그 인 솔루션을 제공 할 수 있다(S1110).Referring to FIG. 12 , the service server 300 may provide at least one plug-in solution using a language conversion service or gesture recognition to a user application ( S1110 ).

서비스 서버(300)는, 사용자 어플리케이션에 설치된 플러그 인 솔루션과 연동하여 사용자 단말에서 생성된 동영상에 대한 언어 변환 서비스 또는 동작 인식을 제공할 수 있다(S1120).The service server 300 may provide a language conversion service or motion recognition for a video generated in the user terminal in conjunction with a plug-in solution installed in the user application (S1120).

단계 S1120에 대한 일 실시예에서, 서비스 서버(300)는, 제1 사용자 어플리케이션에서 제공되는 동영상의 음성 정보를 텍스트로 변환하는 단계, 변환된 텍스트 정보에 대응되는 수화언어 모션 정보를 생성하는 단계 및 변환된 텍스트 정보 또는 수화언어 모션 정보를 제1 사용자 어플리케이션과 연동중인 적어도 하나의 제2 사용자 어플리케이션에 제공하는 단계를 수행할 수 있다.In one embodiment for step S1120, the service server 300, converting the voice information of the video provided by the first user application into text, generating sign language motion information corresponding to the converted text information, and The step of providing the converted text information or sign language motion information to at least one second user application interworking with the first user application may be performed.

단계 S1120에 대한 일 실시예에서, 서비스 서버(300)는, 제1 사용자 어플리케이션에서 제공되는 제1 동영상 및 제2 동영상을 획득하는 단계, 제1 동영상 및 제2 동영상 각각에 대하여 사람의 동작을 인식하는 단계 및 각각 인식된 동작 간의 유사도를 기반으로 일치율을 산출하여 동작 인식을 제공하는 단계를 수행할 수 있다.In an embodiment of step S1120, the service server 300 recognizes a human motion with respect to each of the steps of acquiring the first video and the second video provided from the first user application, and the first video and the second video, respectively. and calculating a matching rate based on the degree of similarity between each recognized motion and providing motion recognition may be performed.

여기에서, 플러그 인 솔루션은, 복수의 사용자 어플리케이션과 연동하여 음성 정보를 텍스트 정보로 변환하고, 설정에 따라 텍스트 정보에 대응되는 수화언어 모션 정보를 생성하여 제공하는 에듀테크 플러그 인 솔루션, 트레이너 사용자 어플리케이션에서 제공되는 동영상에서 트레이너의 동작을 인식하고, 트레이너 사용자 어플리케이션과 연동하여 동작하는 타 사용자 어플리케이션에서 제공되는 동영상에서의 타 사용자의 동작을 인식하여, 두 영상 간의 동작의 유사도를 기반으로 일치율을 산출하여 트레이닝 화상 커뮤니케이션을 제공하는 헬스케어 플러그 인 솔루션 및 참고 동영상 스트리밍에서 댄서의 동작을 인식하고, 타 사용자 어플리케이션에서 제공되는 동영상에서의 타 사용자의 동작을 인식하여, 두 영상 간의 동작의 유사도를 기반으로 일치율을 산출하여 댄스 화상 커뮤니케이션을 제공하는 엔터테인 플러그 인 솔루션 중 적어도 하나를 포함 할 수 있다.Here, the plug-in solution converts voice information into text information by interworking with a plurality of user applications, and generates and provides sign language motion information corresponding to the text information according to the setting. Edutech plug-in solution, trainer user application By recognizing the motion of the trainer in the video provided in A healthcare plug-in solution that provides training video communication and a reference video streaming recognize a dancer's motion, recognize another user's motion in a video provided by another user application, and match the rate based on the similarity of motion between the two videos It may include at least one of the entertainment plug-in solutions that provide dance video communication by calculating .

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고 후술하는 특허청구범위에 의해 한정되며, 본 발명의 구성은 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 그 구성을 다양하게 변경 및 개조할 수 있다는 것을 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 쉽게 알 수 있다.The present invention described above is not limited by the above-described embodiments and the accompanying drawings, but is limited by the claims described below, and the configuration of the present invention may vary within the scope without departing from the technical spirit of the present invention. Those of ordinary skill in the art to which the present invention pertains can easily recognize that it can be changed and modified.

100 : 사용자 단말
300 : 서비스 서버
110 : 통신부 120 : 입출력부
130 : 메모리 140 : 제어부
150 : 사용자 어플리케이션
301 : 시스템 메모리 302 : 운영체제
303 : 프로세싱 유닛 304 : 저장장치
305 : 입력장치 306 : 출력장치
307 : 통신장치
310 : 메인 플랫폼 320 : 언어번역 제공부
330 : 동작인식 제공부 340 : 플러그인 솔루션 제공부
510 : STT 변환모듈 520 : TTM 변환모듈
530 : 번역모듈 540 : TTS 변환모듈
550 : MTT 변환모듈
610 : 객체인식 모듈 620 : 동작인식 모듈
630 : 동작 유사율 판단모듈100: user terminal
300 : service server
110: communication unit 120: input/output unit
130: memory 140: control unit
150: user application
301: system memory 302: operating system
303 processing unit 304 storage device
305: input device 306: output device
307: communication device
310: main platform 320: language translation provider
330: motion recognition providing unit 340: plug-in solution providing unit
510: STT conversion module 520: TTM conversion module
530: translation module 540: TTS conversion module
550: MTT conversion module
610: object recognition module 620: motion recognition module
630: motion similarity rate determination module

Claims

a user application installed in a user terminal and providing a language conversion service to a user through the user terminal; and
Provides a language conversion service between the plurality of user applications by interworking with a plurality of user applications that are interlocked for video communication, and obtains and obtains voice information provided by any one user application to provide the language conversion service A service server that converts the converted voice information into text information, and provides at least one of the converted text information or sign language motion information corresponding to the converted text information to another user application,
The user application is
A first plug-in providing a language conversion service or a second plug-in solution providing gesture recognition is installed in the user terminal according to a user's selection, and the installed first or second plug-in solution is used from the service server To be provided with a language conversion service or a video communication service using gesture recognition,
The service server,
The first plug-in for providing a language conversion service to the user application or the second plug-in solution for providing motion recognition is provided, and the video generated by the user terminal is linked with the plug-in solution installed in the user application. to provide the language conversion service or the motion recognition for
Motion recognition for obtaining a first video and a second video provided from a first user application, recognizing a human motion with respect to each of the first video and the second video, and calculating a matching rate based on the similarity between the recognized motions including a provider;
The plug-in solution is
an edutech plug-in solution that converts voice information into text information in conjunction with the plurality of user applications, and generates and provides sign language motion information corresponding to the text information according to settings;
Recognizes the motion of the trainer in the video provided by the trainer user application, recognizes the motion of another user in the video provided in the video provided by another user application that operates in conjunction with the trainer user application, and the matching rate based on the similarity of the motion between the two images A healthcare plug-in solution that provides training video communication by calculating , and calculates and provides the name of the trainer's motion, the degree of posture matching between the trainer's motion and the user's motion, the degree of timing matching, and the degree of balance matching; and
Recognizes the motion of a dancer in video streaming, recognizes the motion of another user in a video provided by another user application, calculates a match rate based on the similarity of motion between the two videos, and provides dance video communication, see above It includes an entertainment plug-in solution that calculates and provides streaming information of video streaming, posture matching between the dancer's motion and the user's motion, timing matching, and balance matching information,
The user application is
An application platform that provides a list of a plurality of plug-in solutions provided by the service server to the user, and provides customized installation of plug-in solutions according to the user's selection
A plug-in-based video communication platform system that includes.

delete

According to claim 1, wherein the service server,
Converts voice information of a video provided by a first user application into text, generates sign language motion information corresponding to the converted text information, and combines the converted text information or sign language motion information with the first user application Language translation providing unit that provides at least one second user application that is interworking
A plug-in-based video communication platform system comprising a.

The method of claim 4, wherein the service server,
Storing at least one plug-in solution based on a language conversion service or motion recognition, providing the at least one plug-in solution to the user application according to a request of the user application, and a plug-in solution installed in the user application a plug-in solution provider that works with a language conversion service or a plug-in solution based on gesture recognition in conjunction with;
A plug-in-based video communication platform system, characterized in that it further comprises.

delete

The method of claim 4, wherein the language translation providing unit,
a translation module for checking a language set in the user application and translating the text information according to the set language;
STT (Speech to Text) conversion module for recognizing the voice information of the video provided by the user application and converting it into text; and
a text to motion (TTM) conversion module for generating sign language motion information corresponding to the text information;
A plug-in-based video communication platform system comprising a.

The method of claim 7, wherein the language translation providing unit,
a Text to Speech (TTS) conversion module that generates and provides machine voice information corresponding to the text information based on a set language; and
an MTT (Motion to Text) conversion module for recognizing sign language motion in the video provided by the user application, and generating and providing text information corresponding to the recognized sign language motion;
further comprising at least one of
A plug-in-based video communication platform system featuring a.

A method for providing video communication performed in a service server that provides video communication in conjunction with a user application installed in a user terminal, comprising:
providing the user application with at least one plug-in solution using a language conversion service or gesture recognition; and
Comprising the step of providing the language conversion service or the motion recognition for the video generated in the user terminal in conjunction with the plug-in solution installed in the user application,
The step of providing the language conversion service or motion recognition comprises:
acquiring a first video and a second video provided by a first user application;
recognizing a human motion with respect to each of the first video and the second video; and
Comprising the step of providing motion recognition by calculating a matching rate based on the degree of similarity between each recognized motion,
The plug-in solution is
an edutech plug-in solution for converting voice information into text information in conjunction with the user application, and generating and providing sign language motion information corresponding to the text information according to a setting;
Recognizes the movement of the trainer in a video provided by the trainer user application, recognizes the movement of another user in a video provided by another user application that operates in conjunction with the trainer user application, and recognizes the match rate based on the similarity of the motion between the two images A healthcare plug-in solution that provides training video communication by calculating , and calculates and provides the name of the trainer's motion, the degree of posture matching between the trainer's motion and the user's motion, the degree of timing matching, and the degree of balance matching; and
Recognizes the motion of a dancer in the reference video streaming, recognizes the motion of another user in the streaming video provided by another user application, calculates a match rate based on the similarity of the motion between the two images to provide dance video communication, Includes an entertainment plug-in solution that calculates and provides information on streaming information of reference video streaming, posture matching between the dancer's motion and the user's motion, timing matching, and balance matching information,
The user application is
An application platform that provides a list of a plurality of plug-in solutions provided by the service server to the user, and provides customized installation of plug-in solutions according to the user's selection
A plug-in-based video communication providing method comprising a.

10. The method of claim 9, wherein the step of providing the language conversion service or gesture recognition comprises:
converting audio information of a video provided by a first user application into text;
generating sign language motion information corresponding to the converted text information; and
providing the converted text information or the sign language motion information to at least one second user application interworking with the first user application;
A plug-in-based video communication providing method comprising a.

delete