KR102575038B1

KR102575038B1 - Apparatus and method for video conferencing service

Info

Publication number: KR102575038B1
Application number: KR1020210046723A
Authority: KR
Inventors: 이철원; 박병권; 박인범; 경혜윤; 서민지; 정동욱; 노정일
Original assignee: (주)날리지포인트
Priority date: 2020-07-14
Filing date: 2021-04-09
Publication date: 2023-09-07
Also published as: KR20220009318A

Abstract

본 발명은 화상 회의 서비스 제공 장치 및 방법에 대한 것이다. 본 발명의 실시 예에 따르면, 화상 회의 서비스 제공 장치는 사용자 음성의 세기 및 지속시간을 이용하여 발언 중인 화자를 인식할 수 있다.The present invention relates to a video conference service providing apparatus and method. According to an embodiment of the present invention, an apparatus for providing a video conference service may recognize a speaker who is speaking using the strength and duration of a user's voice.

Description

Apparatus and method for providing video conference service {APPARATUS AND METHOD FOR VIDEO CONFERENCING SERVICE}

본 발명은 화상 회의 서비스 제공 기술에 관한 것으로, 더욱 상세하게는 화상 회의에서 화자를 인식하여 화자로 판단된 사용자 영상을 강조하는 화상 회의 서비스 제공 장치 및 방법에 관한 것이다.The present invention relates to a technology for providing a video conference service, and more particularly, to a video conference service providing apparatus and method for recognizing a speaker in a video conference and emphasizing a user image determined to be a speaker.

화상 회의 시스템은 서로 다른 지리적 위치에 거주하는 둘 이상의 사용자 간에 실시간 및 시각적 커뮤니케이션을 수행하는 기술로, 화상 및 음성데이터를 실시간 공유함으로써 사용자에게 회의 환경을 제공하는 시스템을 의미한다.A video conferencing system is a technology for performing real-time and visual communication between two or more users residing in different geographic locations, and refers to a system that provides a conference environment to users by sharing video and audio data in real time.

화상 회의 시스템은 회의, 교육, 상담, 면접, 세미나 등 다양한 분야에 적용되고 있으며, 특히, 정부에서는 원격근무와 유연근무를 포함한 스마트워크제를 적극 도입하면서 화상 회의 시스템 시장이 크게 확대되고 있다. Video conferencing systems are being applied to various fields such as meetings, education, consultations, interviews, and seminars.

일반적으로, 화상 회의 시스템에서는 송출되는 영상 및 음성을 통해 발언 중인 화자를 인식한다. 그러나, 화상 회의 네트워크 환경이 좋지 않아 화상 회의 시스템이 원활하지 않을 경우, 사용자의 영상 및 음성만으로 화자를 감지하기 어려울 수 있다. Generally, in a video conferencing system, a speaker in speaking is recognized through transmitted video and audio. However, when the video conference system is not smooth due to a poor video conference network environment, it may be difficult to detect a speaker only with a user's video and voice.

또한, 화상 회의에 참여하는 복수의 사용자가 동시에 발언할 경우, 화자를 감지하기 어려운 문제가 있다. In addition, when a plurality of users participating in a video conference speak at the same time, it is difficult to detect the speaker.

본 발명의 배경기술은 대한민국 등록특허 제10-1094766호에 게시되어 있다.The background art of the present invention is published in Korean Patent Registration No. 10-1094766.

본 발명은 사용자 음성의 세기 및 지속시간을 이용하여 발언 중인 화자를 인식하는 화상 회의 서비스 제공 장치를 제공하는 것이다.An object of the present invention is to provide a video conferencing service providing device that recognizes a speaking speaker using strength and duration of a user's voice.

본 발명은 화상 회의 사용자 중에서 화자의 영상을 표시하여 화자를 명확히 구분하는 화상 회의 서비스 제공 장치를 제공하는 것이다.An object of the present invention is to provide a video conference service providing device that clearly identifies a speaker by displaying an image of a speaker among video conference users.

본 발명은 사용자의 요청에 따라 화상 회의 컨텐츠를 제어하는 화상 회의 서비스 제공 장치를 제공하는 것이다. An object of the present invention is to provide a video conference service providing device that controls video conference content according to a user's request.

본 발명이 이루고자 하는 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problem to be achieved by the present invention is not limited to the above-mentioned technical problem, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description below. There will be.

본 발명의 일 측면에 따르면, 화상 회의 서비스 제공 장치를 제공한다.According to one aspect of the present invention, a video conference service providing device is provided.

본 발명의 일 실시 예에 따른 화상 회의 서비스 제공 장치는 사용자 단말과 통신을 수행하는 통신부, 사용자 단말로부터 화상 회의 데이터를 수신하여 화상 회의 컨텐츠를 생성하는 회의컨텐츠부, 화상 회의 데이터에 포함된 사용자의 음성을 기반으로 화자를 판단하는 화자감지부 및 화자감지부로부터 화자 정보를 수신할 경우, 화자 정보를 이용하여 화상 회의 컨텐츠를 제어하는 출력제어부를 포함할 수 있다.An apparatus for providing a video conference service according to an embodiment of the present invention includes a communication unit that communicates with a user terminal, a conference content unit that receives video conference data from the user terminal and generates video conference content, and a user information included in the video conference data. A speaker detection unit for determining a speaker based on voice and an output control unit for controlling video conference content using the speaker information when speaker information is received from the speaker detection unit may be included.

본 발명의 다른 일 측면에 따르면, 화상 회의 서비스 제공 방법을 제공한다.According to another aspect of the present invention, a method for providing a video conference service is provided.

본 발명의 일 실시 예에 따른 화상 회의 서비스 제공 방법은 사용자 단말과 통신을 수행하는 단계, 사용자 단말로부터 화상 회의 데이터를 수신하여 화상 회의 컨텐츠를 생성하는 단계, 화상 회의 데이터에 포함된 사용자의 음성을 기반으로 화자를 판단하는 단계 및 화자감지부로부터 화자 정보를 수신할 경우, 화자 정보를 이용하여 화상 회의 컨텐츠를 제어하는 단계를 포함할 수 있다.A method for providing a video conference service according to an embodiment of the present invention includes the steps of performing communication with a user terminal, generating video conference content by receiving video conference data from the user terminal, and generating a user's voice included in the video conference data. The method may include determining a speaker based on the information and controlling video conference contents using the speaker information when speaker information is received from the speaker detection unit.

본 발명의 실시 예에 따르면, 화상 회의 서비스 제공 장치는 사용자 음성의 세기 및 지속시간을 이용하여 발언 중인 화자를 인식할 수 있다.According to an embodiment of the present invention, an apparatus for providing a video conference service may recognize a speaker who is speaking using the strength and duration of a user's voice.

본 발명의 실시 예에 따르면, 화상 회의 서비스 제공 장치는 화상 회의 사용자 중에서 화자의 영상을 표시하여 화자를 명확히 구분할 수 있다.According to an embodiment of the present invention, the video conferencing service providing device can display an image of a speaker among video conference users to clearly distinguish a speaker.

본 발명의 실시 예에 따르면, 화상 회의 서비스 제공 장치는 사용자의 요청에 따라 화상 회의 컨텐츠를 제어할 수 있다.According to an embodiment of the present invention, a video conference service providing device may control video conference content according to a user's request.

본 발명의 효과는 상기한 효과로 한정되는 것은 아니며, 본 발명의 설명 또는 청구범위에 기재된 발명의 구성으로부터 추론 가능한 모든 효과를 포함하는 것으로 이해되어야 한다.The effects of the present invention are not limited to the above effects, and should be understood to include all effects that can be inferred from the description of the present invention or the configuration of the invention described in the claims.

도 1은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 시스템(10)의 구성을 나타낸 도면.
도 2는 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치(100)의 구성을 나타낸 블록도.
도 3은 본 발명의 일 실시예에 따른 화상 회의 화자 감지 방법을 설명하는 흐름도.
도 4는 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 화자 자동 표시 화면을 설명하는 도면.
도 5는 본 발명의 일 실시예에 따른 화자 자동 표시 방법을 설명하는 흐름도.
도 6은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 화자 수동 표시 화면을 설명하는 도면.
도 7은 본 발명의 일 실시예에 따른 화자 수동 표시 방법을 설명하는 흐름도.
도 8은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 화면을 설명하는 도면.
도 9는 본 발명의 일 실시예에 따른 화면 편집 방법을 설명하는 흐름도.
도 10은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 자막 화면을 설명하는 도면
도 11은 본 발명의 일 실시예에 따른 자막 제공 방법을 설명하는 흐름도.
도 12는 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 통번역 화면을 설명하는 도면.
도 13은 본 발명의 일 실시예에 따른 통번역 제공 방법을 설명하는 흐름도.
도 14는 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 발언 요청 화면을 설명하는 도면.
도 15는 본 발명의 일 실시예에 따른 통번역 제공 방법을 설명하는 흐름도.
도 16은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 음소거 상태 알림 화면을 설명하는 도면.
도 17은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 방법을 설명하는 흐름도.
도 18은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 메모 내용이 포함된 회의록을 설명하는 도면.
도 19는 본 발명의 일 실시예에 따른 회의록 제공 방법을 설명하는 흐름도.
도 20은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 회의록을 설명하는 도면.
도 21은 본 발명의 일 실시예에 따른 편집 회의록 제공 방법을 설명하는 흐름도.
도 22는 본 발명의 다른 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 회의록을 설명하는 도면.
도 23은 본 발명의 다른 실시예에 따른 편집 회의록 제공 방법을 설명하는 흐름도.
도 24는 본 발명의 다른 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 회의록을 설명하는 도면.
도 25는 본 발명의 다른 실시예에 따른 편집 회의록 제공 방법을 설명하는 흐름도.
도 26은 본 발명의 다른 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 회의록을 설명하는 도면.
도 27은 본 발명의 다른 실시예에 따른 편집 회의록 제공 방법을 설명하는 흐름도.
도 28은 본 발명의 다른 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 회의록을 설명하는 도면.
도 29는 본 발명의 다른 실시예에 따른 편집 회의록 제공 방법을 설명하는 흐름도.
도 30은 본 발명의 다른 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 출석부를 설명하는 도면.
도 31은 본 발명의 다른 실시예에 따른 출석부 제공 방법을 설명하는 흐름도.
도 32는 본 발명의 다른 실시예에 따른 오탈자 수정 방법을 설명하는 흐름도.
도 33은 본 발명의 다른 일 실시예에 따른 화상 회의 서비스 제공 방법을 설명하는 흐름도.1 is a diagram showing the configuration of a video conference service providing system 10 according to an embodiment of the present invention.
2 is a block diagram showing the configuration of a video conference service providing apparatus 100 according to an embodiment of the present invention.
3 is a flowchart illustrating a method for detecting a speaker in a video conference according to an embodiment of the present invention.
4 is a diagram illustrating an automatic speaker display screen provided by a video conference service providing apparatus according to an embodiment of the present invention;
5 is a flowchart illustrating a method for automatically displaying a speaker according to an embodiment of the present invention.
6 is a diagram illustrating a manual display screen for a speaker provided by a video conference service providing device according to an embodiment of the present invention;
7 is a flowchart illustrating a method for manually displaying a speaker according to an embodiment of the present invention.
8 is a diagram for explaining an editing screen provided by a video conference service providing apparatus according to an embodiment of the present invention;
9 is a flowchart illustrating a screen editing method according to an embodiment of the present invention.
10 is a diagram illustrating a caption screen provided by a video conference service providing device according to an embodiment of the present invention.
11 is a flowchart illustrating a method of providing subtitles according to an embodiment of the present invention.
12 is a diagram for explaining an interpretation/translation screen provided by a video conference service providing apparatus according to an embodiment of the present invention;
13 is a flowchart illustrating a method of providing interpretation and translation according to an embodiment of the present invention.
14 is a view for explaining a speech request screen provided by a video conference service providing apparatus according to an embodiment of the present invention;
15 is a flowchart illustrating a method of providing interpretation and translation according to an embodiment of the present invention.
16 is a diagram illustrating a mute state notification screen provided by a video conference service providing device according to an embodiment of the present invention;
17 is a flowchart illustrating a video conference service providing method according to an embodiment of the present invention.
18 is a view for explaining meeting minutes including memo contents provided by a video conference service providing apparatus according to an embodiment of the present invention;
19 is a flowchart illustrating a method for providing meeting minutes according to an embodiment of the present invention.
20 is a diagram for explaining editing minutes provided by a video conference service providing apparatus according to an embodiment of the present invention;
21 is a flowchart illustrating a method for providing editorial minutes according to an embodiment of the present invention.
22 is a diagram for explaining editing minutes provided by a video conference service providing device according to another embodiment of the present invention;
23 is a flowchart illustrating a method for providing editorial minutes according to another embodiment of the present invention.
24 is a diagram for explaining editing minutes provided by a video conference service providing device according to another embodiment of the present invention;
25 is a flowchart illustrating a method for providing editorial minutes according to another embodiment of the present invention.
26 is a diagram for explaining editing minutes provided by a video conference service providing apparatus according to another embodiment of the present invention;
27 is a flowchart illustrating a method for providing editorial minutes according to another embodiment of the present invention.
28 is a diagram for explaining editing minutes provided by a video conference service providing apparatus according to another embodiment of the present invention;
29 is a flowchart illustrating a method for providing editorial minutes according to another embodiment of the present invention.
30 is a diagram illustrating an attendance book provided by a video conference service providing device according to another embodiment of the present invention.
31 is a flow chart illustrating a method of providing attendance according to another embodiment of the present invention.
32 is a flowchart illustrating a typo correction method according to another embodiment of the present invention.
33 is a flowchart illustrating a video conference service providing method according to another embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서 및 청구항에서 사용되는 단수 표현은, 달리 언급하지 않는 한 일반적으로 "하나 이상"을 의미하는 것으로 해석되어야 한다.Since the present invention can make various changes and have various embodiments, specific embodiments are illustrated in the drawings and will be described in detail through detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the present invention, if it is determined that a detailed description of related known technologies may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. Also, as used in this specification and claims, the terms "a" and "an" are generally to be construed to mean "one or more" unless stated otherwise.

이하, 본 발명의 바람직한 실시 예를 첨부도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성 요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. do it with

도 1은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 시스템(10)의 구성을 나타낸 도면이다. 도 1을 참조하면, 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 시스템(10)은 화상 회의 서비스 제공 장치(100) 및 복수의 사용자 단말(200)을 포함할 수 있다. 1 is a diagram showing the configuration of a video conference service providing system 10 according to an embodiment of the present invention. Referring to FIG. 1 , a video conference service providing system 10 according to an embodiment of the present invention may include a video conference service providing device 100 and a plurality of user terminals 200 .

화상 회의 서비스 제공 장치(100)는 유무선 네트워크 통신을 통해 각각의 사용자 단말(200)과 연결된다. The video conference service providing device 100 is connected to each user terminal 200 through wired/wireless network communication.

화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 화상 회의 데이터를 수신하고, 사용자 단말(200)로 화상 회의 컨텐츠를 송신한다. 예를 들어, 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 수신한 사용자 영상 및 사용자 음성을 포함하는 화상 회의 컨텐츠를 생성하여 각각의 사용자 단말(200)로 송신할 수 있다. 화상 회의 데이터는 사용자 영상, 사용자 음성 이외에도 사용자 이름, 캐릭터, 생년월일, ip 주소 등의 사용자 식별 정보를 포함할 수 있다. The video conference service providing apparatus 100 receives video conference data from the user terminal 200 and transmits video conference contents to the user terminal 200 . For example, the video conference service providing apparatus 100 may generate video conference content including user video and user voice received from the user terminal 200 and transmit the video conference content to each user terminal 200 . The video conference data may include user identification information such as user name, character, date of birth, and IP address in addition to user video and user voice.

화상 회의 서비스 제공 장치(100)는 화상 회의 컨텐츠에서 화자로 판단된 사용자 영상을 강조하여 표시할 수 있다. The video conferencing service providing apparatus 100 may highlight and display a user image determined as a speaker in the video conferencing content.

화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 이름표 생성 요청을 수신할 경우, 사용자의 이름을 추출하여 화상 회의 컨텐츠에 사용자의 이름을 표시할 수 있다.When receiving a name tag generation request from the user terminal 200, the video conference service providing apparatus 100 may extract the user's name and display the user's name in the video conference content.

화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 테두리 강조 요청을 수신할 경우, 사용자 영상의 테두리 색상을 변경할 수 있다.When receiving a request for edge enhancement from the user terminal 200 , the video conference service providing apparatus 100 may change the color of the border of the user image.

화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 캐릭터 표시 요청을 수신할 경우, 사용자 영상을 캐릭터로 변환할 수 있다.When receiving a character display request from the user terminal 200 , the video conference service providing apparatus 100 may convert a user image into a character.

화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 화면 편집 요청을 수신할 경우 화상 회의 컨텐츠를 드래그 앤 드롭 하여 원하는 영상을 분리하거나, 리사이징 하여 영상의 크기를 변경할 수 있다.When receiving a screen editing request from the user terminal 200, the video conferencing service providing device 100 may drag and drop the video conferencing content to separate a desired video or resize the video to change the size of the video.

화상 회의 서비스 제공 장치(100)는 자막생성부(105)로부터 생성된 자막 정보를 표시할 수 있다.The video conference service providing device 100 may display caption information generated by the caption generator 105 .

화상 회의 서비스 제공 장치(100)는 통번역부(106)로부터 생성된 통번역 정보를 표시할 수 있다.The video conference service providing apparatus 100 may display the translation information generated by the interpretation/translation unit 106 .

화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 발언 요청을 수신할 경우 발언 중인 화자를 확인하여 사용자 음성을 제어할 수 있다. When receiving a speech request from the user terminal 200 , the video conferencing service providing apparatus 100 may check a speaker who is speaking and control the user's voice.

화상 회의 서비스 제공 장치(100)는 발언 요청을 통해 발원권을 받은 사용자 단말(200)이 음소거 상태일 때, 음소거 상태를 알려주는 화면을 생성할 수 있다.The video conference service providing apparatus 100 may generate a screen notifying the mute state when the user terminal 200 having received the right of origin through a speech request is in a mute state.

화상 회의 서비스 제공 장치(100)는 사용자가 회의 중 회의 관련 메모를 기록할 경우, 회의록 생성 시 메모 정보를 포함한 회의록을 생성할 수 있다.When a user records meeting-related memos during a meeting, the video conference service providing apparatus 100 may generate meeting minutes including memo information when generating meeting minutes.

화상 회의 서비스 제공 장치(100)는 사용자가 검색한 회의 내용과 대응하는 회의 정보를 추출하여 편집 회의록을 생성할 수 있다.The apparatus 100 for providing a video conference service may generate edited meeting minutes by extracting meeting information corresponding to a meeting content searched by a user.

화상 회의 서비스 제공 장치(100)는 사용자가 선택한 화자에 대응하는 문자 정보를 포함한 편집 회의록을 생성할 수 있다.The video conference service providing apparatus 100 may generate editing minutes including text information corresponding to a speaker selected by a user.

화상 회의 서비스 제공 장치(100)는 사용자가 검색한 회의 시간에 대응하는 문자 정보를 포함한 편집 회의록을 생성할 수 있다.The apparatus 100 for providing a video conference service may generate edited meeting minutes including text information corresponding to a meeting time searched by a user.

화상 회의 서비스 제공 장치(100)는 회의 정보를 이용하여 회의 참여도 정보를 생성하고, 회의 참여도 정보가 포함된 회의록을 생성할 수 있다.The apparatus 100 for providing a video conference service may generate conference participation rate information using the conference information and may generate meeting minutes including the conference participation rate information.

화상 회의 서비스 제공 장치(100)는 통번역부(106)로부터 통번역 정보를 수신할 경우, 번역 텍스트를 이용하여 편집 회의록을 생성할 수 있다. When receiving translation information from the interpretation/translation unit 106 , the video conference service providing device 100 may generate editing minutes using the translated text.

화상 회의 서비스 제공 장치(100)는 얼굴 인식 수행 결과 회의 시작 전 참석 화면과 회의 중 참석 화면에 포함된 인물을 다른 인물로 판단할 경우, 참석부 화면에 얼굴 인식 결과 불일치 텍스트를 포함할 수 있다.When the apparatus 100 for providing a video conference service determines that the person included in the participation screen before the start of the meeting and the participation screen during the meeting are different persons as a result of face recognition, the face recognition result discrepancy text may be included on the participant screen.

사용자 단말(200)은 화상 회의를 실시하기 위해 사용되는 장치로, 사용자의 영상을 촬영하고 음성을 녹음하는 기능을 포함하는 데스크탑, 노트북, 스마트폰, PDA 등과 같은 전자 기기 또는 스마트 글래스, 스마트 고글 등과 같은 웨어러블 기기 등을 포함할 수 있으며, 본 발명이 이에 한정되는 것은 아니다. The user terminal 200 is a device used to conduct a video conference, and includes electronic devices such as desktops, laptops, smartphones, PDAs, etc., including functions for taking a user's video and recording a user's voice, or smart glasses, smart goggles, etc. and the like, but the present invention is not limited thereto.

사용자 단말(200)은 비디오 입력 모듈을 통해 사용자 영상을 획득할 수 있으며, 오디오 입력 모듈을 통해 사용자 음성을 획득할 수 있고, 다양한 방식의 입력 모듈을 통해 사용자 화상 회의 참여 정보를 획득할 수 있다. 사용자 단말(200)은 획득한 사용자 영상, 사용자 음성, 사용자 식별 정보 등을 화상 회의 서비스 제공 장치(100)로 전송할 수 있다. The user terminal 200 may acquire a user image through a video input module, obtain a user voice through an audio input module, and obtain user video conference participation information through various types of input modules. The user terminal 200 may transmit the acquired user image, user voice, user identification information, and the like to the video conferencing service providing device 100 .

사용자 단말(200)은 화상 회의 서비스 제공 장치(100)로부터 수신한 화상 회의 컨텐츠를 화면 상에 출력할 수 있다. 여기서, 사용자는 입력 모듈을 이용하여 사용자 요청 정보를 입력하여 화상 회의 컨텐츠를 변경할 수 있다. 즉, 사용자 단말(200)은 사용자의 기호에 따라 화상 회의 컨텐츠를 커스터마이징 할 수 있다. The user terminal 200 may output video conference content received from the video conference service providing device 100 on a screen. Here, the user may change video conference contents by inputting user requested information using an input module. That is, the user terminal 200 may customize video conference contents according to the user's preference.

사용자 단말(200)은 서로 다른 공간 상에 위치할 수 있다. 또한, 사용자들은 사용자 단말(200)에서 출력되는 화면을 통해 발언 중인 화자를 감지할 수 있으며, 회의 내용을 기반으로 생성된 회의록을 제공받을 수 있다. The user terminals 200 may be located in different spaces. In addition, users may detect a speaker who is speaking through a screen output from the user terminal 200 and may be provided with meeting minutes generated based on the contents of the meeting.

도 2는 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치(100)의 구성을 나타낸 블록도이다. 2 is a block diagram showing the configuration of a video conference service providing apparatus 100 according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치(100)는 통신부(101), 회의컨텐츠부(102), 화자감지부(103), 출력제어부(104), 음성인식부(105), 통번역부(106), 회의록생성부(107), 회의참석판단부(108) 및 저장부(109)를 포함할 수 있다. Referring to FIG. 2 , an apparatus 100 for providing a video conference service according to an embodiment of the present invention includes a communication unit 101, a conference content unit 102, a speaker detection unit 103, an output control unit 104, and voice recognition. It may include a unit 105, an interpretation/translation unit 106, a meeting minutes generation unit 107, a meeting attendance determination unit 108, and a storage unit 109.

통신부(101)는 유무선 네트워크를 통해 사용자 단말(200)과 통신을 수행한다. 예를 들어, 통신부(101)는 LANs(Local Area Networks), WANs(Wide Area Networks), MANs(Metropolitan Area Networks), ISDNs(Integrated Service Digital Networks) 등의 유선 네트워크나, 무선 LANs, CDMA, 블루투스, 위성 통신 등의 무선 네트워크를 이용할 수 있으나, 본 발명이 이에 한정되는 것은 아니다. The communication unit 101 communicates with the user terminal 200 through a wired or wireless network. For example, the communication unit 101 is a wired network such as LANs (Local Area Networks), WANs (Wide Area Networks), MANs (Metropolitan Area Networks), ISDNs (Integrated Service Digital Networks), wireless LANs, CDMA, Bluetooth, A wireless network such as satellite communication may be used, but the present invention is not limited thereto.

통신부(101)는 통신 연결된 사용자 단말(200)로부터 화상 회의 데이터를 수신한다. 여기서, 화상 회의 데이터는 사용자 영상, 사용자 음성뿐만 아니라 사용자 이름, 캐릭터, 생년월일, ip 주소 등의 사용자 식별 정보 등을 포함할 수 있다. 통신부(101)는 수신한 화상 회의 데이터를 저장부(109)에 저장할 수 있다. The communication unit 101 receives video conference data from the communication-connected user terminal 200 . Here, the video conference data may include user identification information such as user name, character, date of birth, and IP address as well as user video and user voice. The communication unit 101 may store the received video conference data in the storage unit 109 .

회의컨텐츠부(102)는 각각의 사용자 단말(200)로부터 화상 회의 데이터를 수신하면, 회의를 진행할 수 있는 회의실을 생성한다. 구체적으로, 회의컨텐츠부(102)는 3개의 사용자 단말(200)로부터 화상 회의 데이터를 수신할 경우, 회의실을 생성하여 3개의 사용자 단말(200)을 생성된 회의실에 입장할 수 있도록 한다. When video conference data is received from each user terminal 200, the conference content unit 102 creates a conference room in which a conference can be held. Specifically, when video conference data is received from the three user terminals 200, the conference content unit 102 creates a conference room and allows the three user terminals 200 to enter the created conference room.

이 때, 회의컨텐츠부(102)는 사용자 단말(200)에 대한 초대 확인 절차를 수행할 수 있다. 예를 들어, 회의컨텐츠부(102)는 생성된 회의실에 미리 초대된 사용자만 참석할 수 있도록 별도의 암호를 설정할 수 있다. 이를 통해, 회의컨텐츠부(102)는 암호를 올바르게 입력한 사용자에게만 회의실에 입장할 수 있도록 함으로써 회의 진행 내용에 대한 보안을 강화할 수 있다. At this time, the conference content unit 102 may perform an invitation confirmation procedure for the user terminal 200 . For example, the meeting content unit 102 may set a separate password so that only users invited in advance can attend the created meeting room. Through this, the conference content unit 102 allows only a user who correctly inputs a password to enter the conference room, thereby enhancing the security of the content of the conference.

회의컨텐츠부(102)는 회의실을 생성하면 각각의 사용자 단말(200)로부터 수신한 화상 회의 데이터를 이용하여 화상 회의 컨텐츠를 생성한다. 예를 들어, 회의컨텐츠부(102)는 4개의 사용자 단말(200)과 연결된 경우, 4개의 화상 회의 데이터를 포함한 화상 회의 컨텐츠를 생성할 수 있다. 여기서, 화상 회의 컨텐츠는 사용자의 음성, 사용자의 영상을 포함하여 생성된다. When a meeting room is created, the conference content unit 102 creates video conference content using video conference data received from each user terminal 200 . For example, when connected to four user terminals 200, the conference content unit 102 may create video conference contents including four video conference data. Here, the video conference contents are generated including the user's voice and the user's video.

회의컨텐츠부(102)는 생성된 화상 회의 컨텐츠를 사용자 단말(200)로 전송할 수 있다. 또한, 회의컨텐츠부(102)는 생성된 화상 회의 컨텐츠를 저장부(109)에 저장할 수 있다. The conference content unit 102 may transmit the created video conference content to the user terminal 200 . Also, the conference content unit 102 may store the created video conference content in the storage unit 109 .

화자감지부(103)는 사용자의 음성을 기반으로 화자를 판단한다. 구체적으로, 화자감지부(103)는 화상 회의 데이터에 포함된 사용자의 음성을 이용하여 음성의 세기 및 음성의 지속시간을 분석한다. 화자감지부(103)는 사용자 음성의 세기가 기설정된 수치를 초과하고, 사용자 음성의 지속시간이 기설정된 시간 이상으로 확인될 경우, 사용자의 음성을 화자로 판단한다. 이 때, 화자감지부(103)는 화자 정보를 생성하여 출력제어부(104)로 전송할 수 있다. The speaker detection unit 103 determines the speaker based on the user's voice. Specifically, the speaker detection unit 103 analyzes the intensity and duration of the voice using the user's voice included in the video conference data. The speaker detecting unit 103 determines the user's voice as a speaker when the strength of the user's voice exceeds a preset value and the duration of the user's voice exceeds the preset time. At this time, the speaker detection unit 103 may generate speaker information and transmit it to the output control unit 104 .

출력제어부(104)는 화자감지부(103)로부터 화자 정보를 수신할 경우, 화자 정보를 기반으로 화상 회의 컨텐츠를 제어할 수 있다. 예를 들어, 출력제어부(104)는 화상 회의 컨텐츠에서 화자로 판단된 사용자 영상을 강조하여 복수의 사용자 단말로 제공함으로써 회의 참가자들이 화자를 시각적으로 확인할 수 있도록 한다. When receiving speaker information from the speaker detector 103, the output control unit 104 may control video conference content based on the speaker information. For example, the output control unit 104 emphasizes a user image determined as a speaker in video conference content and provides it to a plurality of user terminals so that conference participants can visually check the speaker.

또한, 출력제어부(104)는 사용자 단말(200)로부터 화자 수동 표시 요청 신호를 수신할 경우, 화상 회의 컨텐츠를 제어한다. 즉, 출력제어부(104)는 화자감지부(103)로부터 화자 정보를 수신하지 않아도, 사용자 단말의 화자 수동 표시 요청에 따라 복수의 사용자 단말에 대해 화상 회의 컨텐츠를 제어할 수 있다. 예를 들어, 출력제어부(104)는 사용자 단말(200)로부터 이름표 생성 요청을 수신할 경우, 사용자의 이름을 추출하여 화상 회의 컨텐츠에 사용자의 이름을 표시할 수 있다. 출력제어부(104)는 사용자 단말(200)로부터 테두리 강조 요청을 수신할 경우, 화상 회의 컨텐츠에서 수동 표시 요청한 사용자 영상을 강조할 수 있다. 출력제어부(104)는 사용자 단말(200)로부터 캐릭터 추출 요청을 수신할 경우, 화상 회의 컨텐츠에서 사용자의 영상을 캐릭터로 변경할 수 있다. In addition, when receiving a manual speaker display request signal from the user terminal 200, the output control unit 104 controls video conference content. That is, the output control unit 104 may control video conference content for a plurality of user terminals according to a manual display request of a speaker of the user terminal even without receiving speaker information from the speaker detection unit 103 . For example, when receiving a name tag generation request from the user terminal 200, the output control unit 104 may extract the user's name and display the user's name in the video conference content. When receiving a frame emphasis request from the user terminal 200 , the output control unit 104 may emphasize a user image manually requested to be displayed in the video conference content. When receiving a character extraction request from the user terminal 200, the output control unit 104 may change a user's video into a character in video conference content.

출력제어부(104)는 사용자 단말(200)로부터 화면 변경 요청을 수신할 경우, 사용자의 입력에 따라 화상 회의 컨텐츠를 드래그 앤 드롭 또는 리사이징하여 요청한 사용자 단말(200)로 전송할 수 있다.When receiving a screen change request from the user terminal 200, the output control unit 104 may drag and drop or resize the video conference content according to the user's input and transmit it to the requested user terminal 200.

출력제어부(104)는 문자변환부(105)로부터 자막 정보를 수신할 경우, 자막 정보를 기반으로 화상 회의 컨텐츠에서 자막 화면을 생성할 수 있다. When receiving caption information from the text conversion unit 105, the output control unit 104 may generate a caption screen from video conference content based on the caption information.

출력제어부(104)는 통번역부(106)로부터 통번역 정보를 수신할 경우, 번역 텍스트 또는 통역 음성 중 적어도 하나를 화상 회의 컨텐츠에 표시할 수 있다. When receiving translation information from the interpretation/translation unit 106, the output control unit 104 may display at least one of translated text and interpretation voice on the video conference content.

이외에도, 출력제어부(104)는 사용자 단말(200)의 요청 신호에 따라 화상 회의 컨텐츠를 다양하게 제어할 수 있다. In addition, the output control unit 104 may control video conference contents in various ways according to a request signal of the user terminal 200 .

음성인식부(105)는 수신한 사용자 음성을 인식한다. 구체적으로, 음성인식부(105)는 사용자 음성에 대한 특징 정보를 추출하고, 추출된 특징 정보를 학습된 모델에 입력하여 사용자 음성을 인식을 수행함으로써 사용자의 음성을 문자로 변환할 수 있다. 여기서, 음성인식부(105)는 변환된 문자를 이용하여 문자 정보, 자막 정보 등을 생성할 수 있다. 음성인식부(105)는 문자 정보, 자막 정보를 출력제어부(104) 또는 회의록생성부(107)로 전송할 수 있다. The voice recognition unit 105 recognizes the received user voice. Specifically, the voice recognition unit 105 may convert the user's voice into text by extracting feature information of the user's voice and inputting the extracted feature information to the learned model to recognize the user's voice. Here, the voice recognition unit 105 may generate text information, subtitle information, and the like using the converted text. The voice recognition unit 105 may transmit text information and subtitle information to the output control unit 104 or the meeting minutes generation unit 107 .

통번역부(106)는 수신한 사용자 음성을 기반으로 통번역 요청 언어에 대응하는 번역 텍스트를 생성한다. 예를 들어, 통번역부(106)는 사용자 단말(200)로부터 사용자 음성에 대한 영어 통번역을 요청 받을 경우, 수신한 사용자 음성에 대응하는 영어 통번역 정보를 생성한다. 여기서, 통번역 정보는 사용자가 통번역을 요청한 언어의 번역 텍스트 및 통역 음성을 포함할 수 있다. 통번역부(106)는 생성된 통번역 정보를 출력제어부(104) 또는 회의록생성부(107)로 전송할 수 있다. The interpretation/translation unit 106 generates translated text corresponding to the language requested for interpretation/translation based on the received user voice. For example, when receiving a request for English interpretation of a user voice from the user terminal 200, the translation unit 106 generates English translation information corresponding to the received user voice. Here, the interpretation/translation information may include the translation text and interpretation voice of the language requested for interpretation/translation by the user. The interpretation/translation unit 106 may transmit the generated translation/interpretation information to the output control unit 104 or the meeting minutes generation unit 107 .

회의록생성부(107)는 회의를 통해 생성된 회의 정보를 이용하여 회의록을 생성한다. 여기서, 회의 정보는 문자 정보, 자막 정보, 통번역 정보 등을 포함할 수 있으며, 회의록생성부(107)는 문자 정보, 자막 정보, 통번역 정보 중 적어도 하나를 기반으로 회의록을 생성할 수 있다. 여기서, 회의록생성부(107)는 회의 중 작성된 메모가 존재할 경우, 메모 정보를 병합한 회의록을 생성할 수 있다. The meeting minutes generation unit 107 generates meeting minutes using meeting information generated through a meeting. Here, the meeting information may include text information, subtitle information, interpretation/translation information, and the like, and the meeting minutes generating unit 107 may generate meeting minutes based on at least one of text information, subtitle information, and interpretation/translation information. Here, the meeting minutes generating unit 107 may generate meeting minutes by merging the memo information when there are memos created during the meeting.

회의록생성부(107)는 사용자의 요청에 따라 다양한 형식의 편집 회의록을 생성할 수 있다. The meeting minutes generating unit 107 may generate editing meeting minutes in various formats according to a user's request.

예를 들어, 회의록생성부(107)는 사용자 단말로부터 회의 내용 검색 요청을 수신할 경우, 검색된 회의 내용과 대응하는 회의 정보를 추출하여 편집 회의록을 생성할 수 있다. For example, when receiving a search request for meeting content from a user terminal, the meeting minutes generation unit 107 may extract meeting information corresponding to the searched meeting content and generate edited meeting minutes.

회의록생성부(107)는 사용자 단말로부터 화자 한정 회의 내용 검색 요청을 수신할 경우, 선택된 화자와 대응하는 회의 정보를 추출하여 편집 회의록을 생성할 수 있다. When receiving a speaker-limited conference content search request from the user terminal, the meeting minutes generation unit 107 may extract conference information corresponding to the selected speaker and generate edited meeting minutes.

회의록생성부(107)는 사용자 단말로부터 시간 한정 회의 내용 검색 요청을 수신할 경우, 검색된 시간과 대응하는 회의 정보를 추출하여 편집 회의록을 생성할 수 있다. When receiving a search request for content of a time-limited meeting from a user terminal, the meeting minutes generation unit 107 may extract meeting information corresponding to the searched time and generate edited meeting minutes.

회의록생성부(107)는 사용자 단말로부터 회의 내용 통계 요청을 수신할 경우, 회의 정보를 이용하여 회의 참여도 정보를 생성하고, 회의 참여도 정보가 포함된 편집 회의록을 생성할 수 있다. 여기서, 회의 참여도 정보는 화자별 회의 참여도를 나타내는 그래프를 포함하는 정보이다. 회의록생성부(107)는 기설정된 기준에 따라 회의 정보를 이용하여 화자별 회의 참여도 정보를 생성할 수 있다. When receiving a request for statistics on conference contents from the user terminal, the meeting minutes generation unit 107 may generate meeting participation rate information using the meeting information and may generate edited meeting minutes including the meeting participation rate information. Here, the conference participation rate information is information including a graph representing the conference participation rate for each speaker. The meeting minutes generation unit 107 may generate meeting participation degree information for each speaker using the meeting information according to a predetermined criterion.

회의록생성부(107)는 통번역부(106)로부터 통번역 정보를 수신할 경우, 번역 텍스트를 이용하여 편집 회의록을 생성할 수 있다. When receiving translation information from the interpretation/translation unit 106, the meeting minutes generating unit 107 may generate editing meeting minutes using the translated text.

회의참석판단부(108)는 회의 시작 전 및 회의 중 참석 화면을 수신하고, 두 화면을 입력으로 AI 기반 얼굴 인식을 수행하여 동일 인물 여부 확인 절차를 수행할 수 있다. 또한, 얼굴 인식 결과에 따라 출석 및 결석 정보를 생성하여 참석 화면과 함께 출석 여부를 확인할 수 있는 출석부를 제공할 수 있다. The meeting attendance determination unit 108 may receive participation screens before and during the conference, and perform AI-based face recognition using the two screens as input to perform a procedure for confirming whether the same person exists. In addition, attendance and absence information may be generated according to the result of face recognition, and an attendance sheet for confirming attendance along with an attendance screen may be provided.

저장부(109)는 화상 회의를 통해 생성된 정보들을 저장한다. 예를 들어, 저장부(109)는 문자 정보, 자막 정보, 통번역 정보 등을 저장한다. 여기서, 저장부(109)는 문자 정보, 자막 정보 및 통번역 정보에서 맞춤법 판단을 수행하여 오탈자 또는 변한되지 않은 문장을 수정하여 저장할 수 있다. The storage unit 109 stores information generated through a video conference. For example, the storage unit 109 stores text information, subtitle information, interpretation and translation information, and the like. Here, the storage unit 109 may perform spelling judgment on text information, subtitle information, and interpretation/translation information to correct and store misspelled words or unaltered sentences.

도 3은 본 발명의 일 실시예에 따른 화상 회의 화자 감지 방법을 설명하는 흐름도이다. 3 is a flowchart illustrating a method for detecting a speaker in a video conference according to an embodiment of the present invention.

도 3을 참조하면, S301에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)과 통신을 수행한다. Referring to FIG. 3 , in S301 , the video conference service providing device 100 communicates with the user terminal 200 .

S303에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 화상 회의 데이터를 수신한다. In S303, the video conference service providing device 100 receives video conference data from the user terminal 200.

S305에서 화상 회의 서비스 제공 장치(100)는 수신한 각각의 화상 회의 데이터에서 사용자 음성의 세기가 기설정된 수치를 초과할 경우, S307로 이동한다. In S305, the video conference service providing apparatus 100 moves to S307 when the intensity of the user's voice exceeds a predetermined value in each received video conference data.

S307에서 화상 회의 서비스 제공 장치(100)는 수신한 각각의 사용자 음성의 지속시간이 기설정된 시간 이상일 경우, S309로 이동한다. In S307, when the duration of each user's voice received is longer than a predetermined time, the video conference service providing apparatus 100 moves to S309.

S309에서 화상 회의 서비스 제공 장치(100)는 사용자의 음성을 화자로 판단하고, 화자 정보를 포함하는 화상 회의 컨텐츠를 생성한다. In S309, the video conference service providing device 100 determines the user's voice as a speaker and creates video conference content including speaker information.

S311에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로 생성된 화자 정보를 포함한 화상 회의 컨텐츠를 전송한다. In S311 , the video conference service providing device 100 transmits the video conference content including the generated speaker information to the user terminal 200 .

S313에서 화상 회의 서비스 제공 장치(100)는 사용자 음성의 세기가 기설정된 수치를 초과하지 않거나 사용자 음성의 지속시간이 기설정된 시간 미만일 경우, 화자 정보를 포함하지 않은 화상 회의 컨텐츠를 생성한다.In operation S313, the video conference service providing apparatus 100 generates video conference content that does not include speaker information when the intensity of the user's voice does not exceed a preset value or the duration of the user's voice is less than the preset time.

S315에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로 화상 회의 컨텐츠를 전송한다.In S315 , the video conference service providing device 100 transmits video conference contents to the user terminal 200 .

도 4는 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 화자 자동 표시 화면을 설명하는 도면이다. 4 is a diagram illustrating an automatic speaker display screen provided by a video conference service providing apparatus according to an embodiment of the present invention.

도 4를 참조하면, 출력제어부(104)는 화상 회의 컨텐츠에서 화자로 판단된 사용자 영상을 강조하여 표시할 수 있다. 예를 들어, 출력제어부(104)는 화자로 판단된 사용자 영상의 테두리 색상을 변경(a)하거나, 테두리의 두께를 변경하거나(b), 사용자 영상의 크기를 확대할 수 있다(c). Referring to FIG. 4 , the output controller 104 may highlight and display a user image determined to be a speaker in video conference content. For example, the output control unit 104 may change the color of the border of the user image determined as the speaker (a), change the thickness of the border (b), or enlarge the size of the user image (c).

도 5는 본 발명의 일 실시예에 따른 화자 자동 표시 방법을 설명하는 흐름도이다. 5 is a flowchart illustrating a method for automatically displaying a speaker according to an embodiment of the present invention.

도 5를 참조하면, S501에서 화상 회의 서비스 제공 장치(100)는 화자감지부(103)로부터 수신한 화자 정보를 확인한다. Referring to FIG. 5 , in S501 , the apparatus 100 for providing a video conference service checks speaker information received from the speaker detection unit 103 .

S503에서 화상 회의 서비스 제공 장치(100)는 화상 회의 컨텐츠에서 화자로 판단된 사용자 영상을 강조한다. 예를 들어, 화상 회의 서비스 제공 장치(100)는 화자로 판단된 사용자 영상의 테두리 색상을 변경하거나, 테두리의 두께를 변경하거나, 사용자 영상의 크기를 확대할 수 있으며, 사용자의 설정에 의해 사용자 영상의 강조 방식이 선택될 수 있다. In S503, the video conference service providing apparatus 100 emphasizes a user image determined as a speaker in the video conference contents. For example, the video conferencing service providing apparatus 100 may change the color of the border of the user image determined as the speaker, change the thickness of the border, or enlarge the size of the user image, and may set the user image. The emphasis method of can be selected.

S505에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로 사용자 영상이 강조된 화상 회의 컨텐츠를 전송한다. In S505 , the video conference service providing apparatus 100 transmits the video conference content in which the user image is emphasized to the user terminal 200 .

도 6은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 화자 수동 표시 화면을 설명하는 도면이다. 6 is a diagram illustrating a manual display screen for a speaker provided by a video conference service providing device according to an embodiment of the present invention.

도 6을 참조하면, 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 이름표 생성 요청을 수신할 경우, 사용자의 이름을 추출하여 화상 회의 컨텐츠에 사용자의 이름을 표시할 수 있다. 또한, 화상 회의 서비스 제공 장치(100)는 테두리 강조 요청을 수신할 경우, 사용자 영상의 테두리 색상을 변경할 수 있다. 또한, 화상 회의 서비스 제공 장치(100)는 캐릭터 표시 요청을 수신할 경우, 사용자 영상을 캐릭터로 변환할 수 있다. 화상 회의 서비스 제공 장치(100)는 변경된 화상 회의 컨텐츠를 연결된 사용자 단말(200)로 전송할 수 있다. Referring to FIG. 6 , when receiving a name tag generation request from the user terminal 200 , the video conference service providing apparatus 100 may extract the user's name and display the user's name in the video conference content. In addition, the video conference service providing apparatus 100 may change the color of the border of the user's image when receiving a request for edge enhancement. Also, when receiving a character display request, the video conference service providing apparatus 100 may convert a user image into a character. The video conference service providing device 100 may transmit the changed video conference content to the connected user terminal 200 .

도 7은 본 발명의 일 실시예에 따른 화자 수동 표시 방법을 설명하는 흐름도이다. 7 is a flowchart illustrating a method for manually displaying a speaker according to an embodiment of the present invention.

도 7을 참조하면, S701에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 화자 수동 표시 요청을 수신한다. Referring to FIG. 7 , the video conference service providing device 100 receives a manual speaker display request from the user terminal 200 in S701 .

S702에서 화상 회의 서비스 제공 장치(100)는 이름표 생성 요청을 수신한 경우, S703으로 이동한다. When the video conferencing service providing device 100 receives a request for generating a name tag in S702, it moves to S703.

S703에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 수신한 화상 회의 데이터에서 사용자 이름을 추출한다. In S703, the video conference service providing apparatus 100 extracts a user name from the video conference data received from the user terminal 200.

S704에서 화상 회의 서비스 제공 장치(100)는 화상 회의 컨텐츠에서 사용자의 영상에 사용자 이름표를 생성한다. In S704, the video conference service providing apparatus 100 creates a user name tag on the user's video in the video conference content.

S705에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로 사용자 이름표가 생성된 화상 회의 컨텐츠를 전송한다. In S705, the video conference service providing device 100 transmits the video conference content for which the user name tag is generated to the user terminal 200.

S706에서 화상 회의 서비스 제공 장치(100)는 테두리 강조 요청을 수신한 경우 S707로 이동한다.When the video conference service providing device 100 receives a frame emphasis request in S706, it moves to S707.

S707에서 화상 회의 서비스 제공 장치(100)는 화상 회의 컨텐츠에서 사용자 영상을 강조한다. 예를 들어, 화상 회의 서비스 제공 장치(100)는 화상 회의 컨텐츠에서 수신한 사용자 영상의 테두리 색상을 변경하거나 두께를 변경할 수 있다. In S707, the video conference service providing device 100 emphasizes the user's image in the video conference content. For example, the video conference service providing apparatus 100 may change the color or thickness of the border of the user image received from the video conference content.

S708에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로 사용자 영상이 강조된 화상 회의 컨텐츠를 전송한다. In S708 , the video conference service providing device 100 transmits the video conference content in which the user image is emphasized to the user terminal 200 .

S709에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 수신한 화상 회의 데이터에서 사용자 캐릭터를 추출한다. In S709, the video conference service providing device 100 extracts a user character from the video conference data received from the user terminal 200.

S710에서 화상 회의 서비스 제공 장치(100)는 사용자의 영상에 사용자 캐릭터를 생성한다. In S710, the video conference service providing apparatus 100 creates a user character in the user's image.

S711에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로 사용자 캐릭터가 생성된 화상 회의 컨텐츠를 전송한다. In S711 , the video conference service providing device 100 transmits the video conference content in which the user character is created to the user terminal 200 .

도 8은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 화면을 설명하는 도면이다. 8 is a diagram illustrating an editing screen provided by a video conference service providing apparatus according to an embodiment of the present invention.

도 8을 참조하면, 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 화면 편집 요청을 수신할 경우 화상 회의 컨텐츠를 드래그 앤 드롭 하여 원하는 영상을 분리하거나, 리사이징 하여 영상의 크기를 변경할 수 있다. Referring to FIG. 8 , when receiving a screen editing request from the user terminal 200, the video conferencing service providing apparatus 100 may drag and drop video conferencing contents to separate a desired video or resize the video to change the size of the video. there is.

도 9는 본 발명의 일 실시예에 따른 화면 편집 방법을 설명하는 흐름도이다.9 is a flowchart illustrating a screen editing method according to an embodiment of the present invention.

도 9를 참조하면, S901에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 화면 편집 요청을 수신한다. Referring to FIG. 9 , the video conference service providing device 100 receives a screen editing request from the user terminal 200 in S901 .

S902에서 화상 회의 서비스 제공 장치(100)는 드래그 앤 드롭 요청을 수신한 경우, S903으로 이동한다.When the video conference service providing apparatus 100 receives a drag and drop request in S902, it moves to S903.

S903에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 수신한 화상 회의 데이터에서 사용자 이름을 추출한다. In S903, the video conference service providing apparatus 100 extracts a user name from the video conference data received from the user terminal 200.

S904에서 화상 회의 서비스 제공 장치(100)는 화상 회의 컨텐츠에 포함된 사용자의 영상에 사용자 이름표를 생성한다. In S904, the video conference service providing apparatus 100 creates a user name tag on the user's image included in the video conference content.

S905에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로 사용자 이름표가 생성된 화상 회의 컨텐츠를 전송한다. In S905, the video conference service providing apparatus 100 transmits the video conference content for which the user name tag is generated to the user terminal 200.

S906에서 화상 회의 서비스 제공 장치(100)는 테두리 강조 요청을 수신한 경우 화상 회의 컨텐츠에서사용자 영상을 강조한다. 예를 들어, 화상 회의 서비스 제공 장치(100)는 화상 회의 컨텐츠에 포함된 사용자 영상의 테두리 색상을 변경하거나 두께를 변경할 수 있다.In S906, the video conference service providing apparatus 100 emphasizes the user's image in the video conference content when receiving the frame emphasis request. For example, the video conference service providing apparatus 100 may change the color or thickness of the border of the user image included in the video conference content.

도 10은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 자막 화면을 설명하는 도면이다. 10 is a diagram illustrating a caption screen provided by a video conference service providing apparatus according to an embodiment of the present invention.

도 10을 참조하면, 화상 회의 서비스 제공 장치(100)는 자막생성부(105)로부터 생성된 자막 정보를 표시할 수 있다. 예를 들어, 화상 회의 서비스 제공 장치(100)는 자막 정보를 기반으로 화상 회의 컨텐츠에 자막 화면을 생성할 수 있다. Referring to FIG. 10 , the video conference service providing device 100 may display caption information generated by the caption generator 105 . For example, the video conference service providing apparatus 100 may generate a caption screen for video conference content based on caption information.

도 11은 본 발명의 일 실시예에 따른 자막 제공 방법을 설명하는 흐름도이다. 11 is a flowchart illustrating a method of providing captions according to an embodiment of the present invention.

도 11을 참조하면, S1101에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 자막 요청을 수신한다. Referring to FIG. 11 , the video conference service providing device 100 receives a caption request from the user terminal 200 in S1101.

S1102에서 화상 회의 서비스 제공 장치(100)는 화상 회의 데이터에서 사용자 음성을 추출하여 STT(Speech To Text)를 기반으로 자막 정보로 변환한다. In S1102, the video conference service providing apparatus 100 extracts the user's voice from the video conference data and converts it into subtitle information based on STT (Speech To Text).

S1103에서 화상 회의 서비스 제공 장치(100)는 자막 정보를 기반으로 화상 회의 컨텐츠에 자막 화면을 생성한다. In S1103, the video conference service providing apparatus 100 generates a caption screen for the video conference content based on the caption information.

S1104에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로 자막 화면이 생성된 화상 회의 컨텐츠를 전송한다.In operation S1104, the video conference service providing device 100 transmits the video conference content for which the caption screen is generated to the user terminal 200.

도 12는 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 통번역 화면을 설명하는 도면이다. 12 is a diagram illustrating an interpretation/translation screen provided by a video conference service providing device according to an embodiment of the present invention.

도 12를 참조하면, 화상 회의 서비스 제공 장치(100)는 통번역부(106)로부터 생성된 통번역 정보를 표시할 수 있다. 예를 들어, 화상 회의 서비스 제공 장치(100)는 사용자의 통번역 언어 선택에 따라 생성된 통번역 정보를 기반으로 화상 회의 컨텐츠에 텍스트 및/또는 음성을 생성할 수 있다. Referring to FIG. 12 , the video conference service providing apparatus 100 may display interpretation and translation information generated by the interpretation and translation unit 106 . For example, the video conference service providing apparatus 100 may generate text and/or voice in video conference content based on interpretation information generated according to a user's selection of an interpretation language.

도 13은 본 발명의 일 실시예에 따른 통번역 제공 방법을 설명하는 흐름도이다.13 is a flowchart illustrating a method of providing interpretation and translation according to an embodiment of the present invention.

도 13을 참조하면, S1301에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 통번역 요청을 수신한다. Referring to FIG. 13 , the video conference service providing apparatus 100 receives an interpretation/translation request from the user terminal 200 in S1301.

S1302에서 화상 회의 서비스 제공 장치(100)는 화상 회의 데이터에서 사용자 음성을 추출하여 음성 데이터의 언어를 판단한다. In S1302, the video conference service providing apparatus 100 extracts the user's voice from the video conference data and determines the language of the voice data.

S1303에서 화상 회의 서비스 제공 장치(100)는 통번역 요청 언어에 대응하는 번역 텍스트를 생성한다.In S1303, the video conference service providing device 100 generates translated text corresponding to the language requested for interpretation and translation.

S1304에서 화상 회의 서비스 제공 장치(100)는 TTS(?? 뭐에 대한 약자인가요?)를 기반으로 번역 텍스트에 대한 통역 음성 정보를 생성한다. In S1304, the video conference service providing apparatus 100 generates interpretation voice information for the translated text based on TTS (?? What does it stand for?).

S1305에서 화상 회의 서비스 제공 장치(100)는 화상 회의 데이터에 통역 음성 및 번역 텍스트를 포함한다. In S1305, the video conference service providing device 100 includes the interpretation voice and the translated text in the video conference data.

S1306에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로 통역 음성 및 번역 텍스트가 포함된 화상 회의 컨텐츠를 전송한다.In operation S1306, the video conference service providing device 100 transmits the video conference content including the interpretation voice and translated text to the user terminal 200.

도 14는 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 발언 요청 화면을 설명하는 도면이다. 14 is a diagram illustrating a speech request screen provided by a video conference service providing apparatus according to an embodiment of the present invention.

도 14를 참조하면, 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 발언 요청을 수신할 경우 발언 중인 화자를 확인하여 사용자 음성을 제어할 수 있다. Referring to FIG. 14 , when receiving a speech request from the user terminal 200 , the video conferencing service providing apparatus 100 may check a speaker who is speaking and control the user's voice.

도 15는 본 발명의 일 실시예에 따른 통번역 제공 방법을 설명하는 흐름도이다.15 is a flowchart illustrating a method for providing interpretation and translation according to an embodiment of the present invention.

도 15를 참조하면, S1501에서 화상 회의 서비스 제공 장치(100)는 사용자 단말(200)로부터 발언 요청을 수신한다. Referring to FIG. 15 , the video conference service providing apparatus 100 receives a speech request from the user terminal 200 in S1501.

S1502에서 화상 회의 서비스 제공 장치(100)는 화자감지부(103)로부터 발언 중인 화자가 존재하는지 확인한다. In S1502, the video conferencing service providing device 100 checks whether there is a speaking speaker from the speaker detecting unit 103.

S1503에서 화상 회의 서비스 제공 장치(100)는 화자가 존재할 경우 S1502로 이동한다. 화상 회의 서비스 제공 장치(100)는 화자가 존재하지 않을 경우 S1504로 이동한다. In S1503, the video conference service providing apparatus 100 moves to S1502 when there is a speaker. The video conference service providing device 100 moves to S1504 when no speaker exists.

S1504에서 화상 회의 서비스 제공 장치(100)는 요청한 사용자 음성만을 출력하여 연결된 사용자 단말(200)로 전송한다. In S1504, the video conference service providing apparatus 100 outputs only the requested user voice and transmits it to the connected user terminal 200.

도 16은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 음소거 상태 알림 화면을 설명하는 도면이다. 16 is a diagram illustrating a mute state notification screen provided by a video conference service providing device according to an embodiment of the present invention.

도 16을 참조하면, 화상 회의 서비스 제공 장치(100)는 발언 요청을 통해 발원권을 받은 사용자 단말(200)이 음소거 상태일 때, 음소거 상태를 알려주는 화면을 생성할 수 있다. 이 때, 출력제어부(104)는 화상 회의 컨텐츠에 음소거된 사용자의 영상에 대해 음소거 상태를 알리는 아이콘을 생성하거나 테두리 색상을 변경할 수 있다. Referring to FIG. 16 , the video conference service providing apparatus 100 may generate a screen notifying the mute state when the user terminal 200 having received the right of origin through a speech request is in a mute state. At this time, the output control unit 104 may create an icon indicating a muted state or change a border color of the muted user's video in the video conference content.

도 17은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 방법을 설명하는 흐름도이다.17 is a flowchart illustrating a video conference service providing method according to an embodiment of the present invention.

도 17을 참조하면, 화상 회의 서비스 제공 장치(100)는 복수의 사용자 단말(200')과 연동되어 회의실을 생성하고, 복수의 사용자를 회의실에 초대한다. Referring to FIG. 17 , the video conference service providing device 100 interworks with a plurality of user terminals 200' to create a conference room and invites a plurality of users to the conference room.

화상 회의 서비스 제공 장치(100)는 복수의 사용자 단말(200')과 영상 및 음성이 연결된다.The video conference service providing device 100 is connected to a plurality of user terminals 200' through video and audio.

사용자가 발언을 시작하거나 녹음을 시작하면, 화상 회의 서비스 제공 장치(100)는 화자를 인식하여 화자를 감지하고 선정한다.When a user starts speaking or recording, the video conferencing service providing device 100 recognizes a speaker, detects and selects a speaker.

화상 회의 서비스 제공 장치(100)는 감지된 화자의 정보를 복수의 사용자 단말(200')로 전송한다.The video conferencing service providing apparatus 100 transmits information on the detected speaker to a plurality of user terminals 200'.

복수의 사용자 단말(200')은 화자 화면을 표시한다.A plurality of user terminals 200' display a speaker screen.

사용자가 발언을 멈추거나 녹음을 중지하면, 화상 회의 서비스 제공 장치(100)는 화자의 발언이 멈춘 것을 인식하여 화자 감지를 해제한다.When the user stops talking or recording, the video conferencing service providing apparatus 100 recognizes that the speaker has stopped speaking and cancels speaker detection.

화상 회의 서비스 제공 장치(100)는 복수의 사용자 단말(200')로 화자 해제 정보를 전송한다. The video conference service providing apparatus 100 transmits speaker release information to a plurality of user terminals 200'.

복수의 사용자 단말(200')은 화자 화면 표시를 해제한다.The plurality of user terminals 200' release the display of the speaker screen.

회의가 끝나면, 복수의 사용자 단말(200')은 녹음된 파일을 화상 회의 서비스 제공 장치(100)로 전송한다. When the conference ends, the plurality of user terminals 200' transmit the recorded file to the video conference service providing device 100.

화상 회의 서비스 제공 장치(100)는 녹음파일 및 화자 정보를 정보 저장소 또는 파일 저장소에 저장한다. The video conference service providing apparatus 100 stores a recorded file and speaker information in an information storage or file storage.

도 18은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 메모 내용이 포함된 회의록을 설명하는 도면이다. 18 is a diagram illustrating meeting minutes including memo contents provided by a video conference service providing apparatus according to an embodiment of the present invention.

도 18을 참조하면, 화상 회의 서비스 제공 장치(100)는 사용자가 회의 중 회의 관련 메모를 기록할 경우, 회의록 생성 시 메모 정보를 포함한 회의록을 생성한다. Referring to FIG. 18 , when a user records meeting-related memos during a meeting, the video conferencing service providing apparatus 100 creates meeting minutes including memo information when generating meeting minutes.

도 19는 본 발명의 일 실시예에 따른 회의록 제공 방법을 설명하는 흐름도이다.19 is a flowchart illustrating a method for providing meeting minutes according to an embodiment of the present invention.

도 19를 참조하면, S1901에서 화상 회의 서비스 제공 장치(100)는 회의가 종료되면 회의를 통해 생성된 회의 정보를 저장한다. Referring to FIG. 19 , when the video conference service providing device 100 ends the conference in S1901, conference information generated through the conference is stored.

S1902에서 화상 회의 서비스 제공 장치(100)는 회의 정보에 메모 정보가 포함된 경우, S1903으로 이동한다.In S1902, when the video conference service providing device 100 includes memo information in the conference information, it moves to S1903.

S1903에서 화상 회의 서비스 제공 장치(100)는 메모 내용을 포함한 회의록을 생성한다. In S1903, the video conference service providing device 100 generates meeting minutes including memo contents.

S1904에서 화상 회의 서비스 제공 장치(100)는 회의 정보에 메모 정보가 포함되지 않을 경우 사용자 요청에 따른 편집 회의록을 생성한다. In operation S1904, when the memo information is not included in the meeting information, the video conference service providing device 100 generates edited meeting minutes according to the user's request.

도 20은 본 발명의 일 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 회의록을 설명하는 도면이다. 20 is a diagram for explaining an editing meeting record provided by a video conference service providing device according to an embodiment of the present invention.

도 20을 참조하면, 화상 회의 서비스 제공 장치(100)는 사용자가 검색한 회의 내용과 대응하는 회의 정보를 추출하여 편집 회의록을 생성한다. 예를 들어, 사용자가 “발언내용”을 검색할 경우, 화상 회의 서비스 제공 장치(100)는 “발언내용”과 대응하는 문자 정보를 추출할 수 있다. 여기서, 화상 회의 서비스 제공 장치(100)는 추출된 문자 정보를 화자 별로 구분하여 표시할 수 있다.Referring to FIG. 20 , the apparatus 100 for providing a video conferencing service extracts conference information corresponding to the content of a conference searched by a user and generates edited conference minutes. For example, when a user searches for “contents of speech”, the video conferencing service providing apparatus 100 may extract text information corresponding to “contents of speech”. Here, the video conference service providing apparatus 100 may classify and display the extracted text information for each speaker.

도 21은 본 발명의 일 실시예에 따른 편집 회의록 제공 방법을 설명하는 흐름도이다.21 is a flowchart illustrating a method for providing editorial minutes according to an embodiment of the present invention.

도 21을 참조하면, S2101에서 화상 회의 서비스 제공 장치(100)는 사용자 단말로부터 회의 내용 검색 요청을 수신한다. Referring to FIG. 21 , in S2101, the video conference service providing apparatus 100 receives a conference content search request from the user terminal.

S2102에서 화상 회의 서비스 제공 장치(100)는 검색된 회의 내용과 대응하는 문자 정보를 추출한다. In S2102, the video conference service providing device 100 extracts text information corresponding to the searched conference contents.

S2103에서 화상 회의 서비스 제공 장치(100)는 화자 별로 문자 정보를 구분한 회의 내용을 생성한다.In S2103, the video conference service providing device 100 creates conference contents by classifying text information for each speaker.

S2104에서 화상 회의 서비스 제공 장치(100)는 생성된 회의 내용을 기반으로 편집 회의록을 생성한다.In S2104, the video conference service providing apparatus 100 generates editing meeting minutes based on the created meeting contents.

도 22는 본 발명의 다른 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 회의록을 설명하는 도면이다. 22 is a diagram illustrating an editing meeting transcript provided by a video conference service providing apparatus according to another embodiment of the present invention.

도 22를 참조하면, 화상 회의 서비스 제공 장치(100)는 사용자가 선택한 화자에 대응하는 문자 정보를 포함한 편집 회의록을 생성한다. 예를 들어, 사용자가 user1, user2 및 user3를 선택할 경우, 화상 회의 서비스 제공 장치(100)는 user1, user2 및 user3에 대응하는 문자 정보만을 추출할 수 있다. Referring to FIG. 22 , the apparatus 100 for providing a video conference service generates editing minutes including text information corresponding to a speaker selected by a user. For example, when the user selects user1, user2, and user3, the video conference service providing apparatus 100 may extract only text information corresponding to user1, user2, and user3.

도 23은 본 발명의 다른 실시예에 따른 편집 회의록 제공 방법을 설명하는 흐름도이다.23 is a flowchart illustrating a method for providing editorial minutes according to another embodiment of the present invention.

도 23을 참조하면, S2301에서 화상 회의 서비스 제공 장치(100)는 사용자 단말로부터 화자 한정 회의 내용 검색 요청을 수신한다. Referring to FIG. 23 , in S2301, the apparatus 100 for providing a video conference service receives a speaker-limited conference content search request from a user terminal.

S2302에서 화상 회의 서비스 제공 장치(100)는 선택된 화자에 대응하는 문자 정보를 추출한다. In S2302, the video conference service providing apparatus 100 extracts text information corresponding to the selected speaker.

S2303에서 화상 회의 서비스 제공 장치(100)는 화자 별 문자 정보를 구분한 회의 내용을 생성한다. 여기서, 화상 회의 서비스 제공 장치(100)는 해당 문자 정보에 대응하는 회의 시간을 회의 내용에 포함할 수도 있다. In S2303, the video conference service providing device 100 generates conference content by classifying text information for each speaker. Here, the video conference service providing device 100 may include a conference time corresponding to the text information in the content of the conference.

S2304에서 화상 회의 서비스 제공 장치(100)는 생성된 회의 내용을 기반으로 편집 회의록을 생성한다. In S2304, the video conference service providing device 100 generates editing minutes based on the created conference content.

도 24는 본 발명의 다른 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 회의록을 설명하는 도면이다. 24 is a diagram illustrating an editing meeting record provided by a video conference service providing device according to another embodiment of the present invention.

도 24를 참조하면, 화상 회의 서비스 제공 장치(100)는 사용자가 검색한 회의 시간에 대응하는 문자 정보를 포함한 편집 회의록을 생성한다. 예를 들어, 사용자가 “00:00~01:00”를 검색할 경우, 화상 회의 서비스 제공 장치(100)는 해당 시간에 대응하는 문자 정보를 추출할 수 있다. 여기서, 화상 회의 서비스 제공 장치(100)는 추출된 문자 정보를 화자 별로 구분하여 표시할 수 있다.Referring to FIG. 24 , the apparatus 100 for providing a video conference service generates edited meeting minutes including text information corresponding to a meeting time searched by a user. For example, when a user searches for “00:00~01:00”, the video conference service providing apparatus 100 may extract text information corresponding to the corresponding time. Here, the video conference service providing apparatus 100 may classify and display the extracted text information for each speaker.

도 25는 본 발명의 다른 실시예에 따른 편집 회의록 제공 방법을 설명하는 흐름도이다.25 is a flowchart illustrating a method for providing editorial minutes according to another embodiment of the present invention.

도 25를 참조하면, S2501에서 화상 회의 서비스 제공 장치(100)는 사용자 단말로부터 시간 한정 회의 내용 검색 요청을 수신한다.Referring to FIG. 25 , in S2501, the apparatus 100 for providing a video conference service receives a search request for content of a time-limited conference from a user terminal.

S2502에서 화상 회의 서비스 제공 장치(100)는 검색된 회의 시간과 대응하는 문자 정보를 추출한다. In S2502, the video conference service providing apparatus 100 extracts text information corresponding to the searched conference time.

S2503에서 화상 회의 서비스 제공 장치(100)는 화자 별로 문자 정보를 구분한 회의 내용을 생성한다.In S2503, the video conference service providing device 100 generates conference content by classifying text information for each speaker.

S2504에서 화상 회의 서비스 제공 장치(100)는 생성된 회의 내용을 기반으로 편집 회의록을 생성한다.In S2504, the video conference service providing apparatus 100 generates editing minutes based on the created conference content.

도 26은 본 발명의 다른 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 회의록을 설명하는 도면이다. 26 is a diagram illustrating an editing meeting transcript provided by a video conference service providing device according to another embodiment of the present invention.

도 26을 참조하면, 화상 회의 서비스 제공 장치(100)는 회의 정보를 이용하여 회의 참여도 정보를 생성하고, 회의 참여도 정보가 포함된 회의록을 생성할 수 있다. 여기서, 회의 참여도 정보는 화자별 회의 참여도를 나타내는 그래프를 포함하는 정보이다. 화상 회의 서비스 제공 장치(100)는 기설정된 기준에 따라 회의 정보를 이용하여 화자별 회의 참여도 정보를 생성할 수 있다. 화상 회의 서비스 제공 장치(100)는 막대 그래프, 원 그래프 등 다양한 그래프 형식을 통해 화자별 회의 참여도 정보를 제공할 수 있다. Referring to FIG. 26 , the apparatus 100 for providing a video conference service may generate conference participation rate information using conference information and generate meeting minutes including the conference participation rate information. Here, the conference participation rate information is information including a graph representing the conference participation rate for each speaker. The video conference service providing device 100 may generate conference participation information for each speaker using conference information according to a predetermined criterion. The video conference service providing apparatus 100 may provide conference participation information for each speaker through various graph formats such as bar graphs and circle graphs.

도 27은 본 발명의 다른 실시예에 따른 편집 회의록 제공 방법을 설명하는 흐름도이다.27 is a flowchart illustrating a method of providing editorial minutes according to another embodiment of the present invention.

도 27을 참조하면, S2701에서 화상 회의 서비스 제공 장치(100)는 사용자 단말로부터 시간 한정 회의 내용 검색 요청을 수신한다.Referring to FIG. 27 , in S2701 , the apparatus 100 for providing a video conference service receives a search request for content of a time-limited conference from a user terminal.

S2702에서 화상 회의 서비스 제공 장치(100)는 회의 정보를 이용하여 회의 참여도 정보를 생성한다. In S2702, the video conference service providing apparatus 100 generates conference participation information using the conference information.

S2703에서 화상 회의 서비스 제공 장치(100)는 생성된 회의 참여도 정보를 기반으로 편집 회의록을 생성한다.In S2703, the video conference service providing device 100 generates edited minutes based on the generated conference participation information.

도 28은 본 발명의 다른 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 편집 회의록을 설명하는 도면이다. 28 is a diagram illustrating an editing meeting record provided by a video conference service providing device according to another embodiment of the present invention.

도 28을 참조하면, 화상 회의 서비스 제공 장치(100)는 통번역부(106)로부터 통번역 정보를 수신할 경우, 번역 텍스트를 이용하여 편집 회의록을 생성한다. Referring to FIG. 28 , when receiving translation information from the translation/interpretation unit 106 , the apparatus 100 for providing a video conference service generates editing minutes using the translated text.

도 29는 본 발명의 다른 실시예에 따른 편집 회의록 제공 방법을 설명하는 흐름도이다.29 is a flowchart illustrating a method for providing editorial minutes according to another embodiment of the present invention.

도 29를 참조하면, S2901에서 화상 회의 서비스 제공 장치(100)는 통번역부(106)로부터 통번역 정보를 수신한다.Referring to FIG. 29 , the video conference service providing device 100 receives interpretation/translation information from the interpretation/translation unit 106 in S2901.

S2902에서 화상 회의 서비스 제공 장치(100)는 번역 텍스트를 이용하여 회의 내용을 생성한다.In S2902, the video conference service providing apparatus 100 creates conference content using the translated text.

S2903에서 화상 회의 서비스 제공 장치(100)는 생성된 회의 내용을 기반으로 편집 회의록을 생성한다. In S2903, the video conference service providing device 100 generates editing minutes based on the created conference content.

도 30은 본 발명의 다른 실시예에 따른 화상 회의 서비스 제공 장치가 제공하는 출석부를 설명하는 도면이다.30 is a diagram illustrating an attendance book provided by a video conference service providing apparatus according to another embodiment of the present invention.

도 30을 참조하면, 화상 회의 서비스 제공 장치(100)는 회의 시작 전 및 회의 중 참석 화면을 수신하여 동일 인물 여부 확인 절차를 수행한다. 화상 회의 서비스 제공 장치(100)는 얼굴 인식 결과 동일한 인물로 판단되면, 출석 정보를 생성하고, 얼굴 인식 결과 다른 인물로 판단되거나 수신한 참석 화면이 없을 경우, 결석 정보를 생성한다. 여기서, 결석 정보는 결석 판단 내용을 포함할 수 있다. 예를 들어, 화상 회의 서비스 제공 장치(100)는 얼굴 인식 수행 결과 회의 시작 전 참석 화면과 회의 중 참석 화면에 포함된 인물을 다른 인물로 판단할 경우, 참석부 화면에 얼굴 인식 결과 불일치 텍스트를 포함할 수 있다. Referring to FIG. 30 , the video conferencing service providing device 100 receives participation screens before and during the conference and performs a procedure for confirming whether the same person exists. The video conferencing service providing device 100 generates attendance information when it is determined as the same person as a result of face recognition, and generates absence information when it is determined as a different person or there is no received attendance screen as a result of face recognition. Here, the absence information may include absence determination information. For example, when the video conferencing service providing apparatus 100 determines that a person included in the participation screen before the meeting starts and the participation screen during the meeting as a different person as a result of performing face recognition, the face recognition result discrepancy text is included on the screen of the attending part. can do.

도 31은 본 발명의 다른 실시예에 따른 출석부 제공 방법을 설명하는 흐름도이다. 31 is a flow chart illustrating a method for providing an attendance sheet according to another embodiment of the present invention.

도 31을 참조하면, S3101에서 화상 회의 서비스 제공 장치(100)는 회의 시작 전 회의실에 참가한 사용자의 영상이 포함된 회의 참석 화면을 수신한다. Referring to FIG. 31 , in S3101, the video conference service providing apparatus 100 receives a conference attendance screen including images of users participating in the conference room before the conference starts.

S3102에서 화상 회의 서비스 제공 장치(100)는 회의 중 회의실에 참가한 사용자의 영상이 포함된 회의 참석 화면을 수신한다. In S3102, the video conference service providing apparatus 100 receives a conference attendance screen including images of users participating in the conference room during the conference.

S3103에서 화상 회의 서비스 제공 장치(100)는 AI를 기반으로 회의 시작 전 및 회의 중 화면에 포함된 사용자의 얼굴을 인식한다. In S3103, the video conference service providing device 100 recognizes a user's face included in the screen before and during the conference based on AI.

S3104에서 화상 회의 서비스 제공 장치(100)는 얼굴 인식 결과 동일한 인물로 판단될 경우, S3105로 이동한다. In S3104, the video conference service providing device 100 moves to S3105 when it is determined as the same person as a result of face recognition.

S3105에서 화상 회의 서비스 제공 장치(100)는 동일한 인물로 판단된 사용자에 대한 출석 정보를 생성한다. In S3105, the video conference service providing device 100 generates attendance information for the user determined to be the same person.

S3106에서 화상 회의 서비스 제공 장치(100)는 동일한 인물로 판단되지 않은 사용자에 대한 결석 정보를 생성한다. In S3106, the video conference service providing apparatus 100 generates absence information for users who are not determined to be the same person.

S3107에서 화상 회의 서비스 제공 장치(100)는 생성된 출석 및 결석 정보를 이용하여 출석부를 생성한다. In S3107, the video conference service providing device 100 creates an attendance book using the generated attendance and absence information.

도 32는 본 발명의 다른 실시예에 따른 오탈자 수정 방법을 설명하는 흐름도이다. 32 is a flowchart illustrating a typo correction method according to another embodiment of the present invention.

도 32를 참조하면, S3201에서 화상 회의 서비스 제공 장치(100)는 화상 회의를 통해 생성된 정보들을 저장한다. 예를 들어, 화상 회의 서비스 제공 장치(100)는 문자 정보, 자막 정보, 통번역 정보 등을 저장한다.Referring to FIG. 32 , in S3201, the video conference service providing apparatus 100 stores information generated through the video conference. For example, the video conference service providing apparatus 100 stores text information, subtitle information, interpretation and translation information, and the like.

S3202에서 화상 회의 서비스 제공 장치(100)는 저장된 문자 정보, 자막 정보, 통번역 정보 중 변환되지 않은 문장이 존재하는지 판단한다.In S3202, the video conference service providing apparatus 100 determines whether there is an unconverted sentence among stored text information, subtitle information, and translation information.

S3203에서 화상 회의 서비스 제공 장치(100)는 변환되지 않은 문장에서 오탈자가 존재하는지 판단한다. In S3203, the video conference service providing device 100 determines whether a typo exists in the unconverted sentence.

S3204에서 화상 회의 서비스 제공 장치(100)는 오탈자가 존재할 경우, 오탈자 문구를 별도 표시하여 저장한다.In S3204, the video conferencing service providing apparatus 100 separately displays and stores the misspelled phrase if there is a typo.

S3205에서 화상 회의 서비스 제공 장치(100)는 변환되지 않은 문장 전체를 특정 문구로 변환하여 저장한다. 예를 들어, 화상 회의 서비스 제공 장치(100)는 변환되지 않은 문장 전체를 “?” 로 변환하여 저장할 수 있다. In S3205, the video conferencing service providing device 100 converts all unconverted sentences into specific phrases and stores them. For example, the video conferencing service providing device 100 replaces the entire unconverted sentence with “?” can be converted and saved.

도 33은 본 발명의 다른 일 실시예에 따른 화상 회의 서비스 제공 방법을 설명하는 흐름도이다.33 is a flowchart illustrating a video conference service providing method according to another embodiment of the present invention.

도 33을 참조하면, 화상 회의 서비스 제공 장치(100)는 복수의 사용자 단말(200')과 연동되어 회의실을 생성하고, 복수의 사용자를 회의실에 초대한다. Referring to FIG. 33 , the video conference service providing device 100 interworks with a plurality of user terminals 200' to create a conference room and invites a plurality of users to the conference room.

화상 회의 서비스 제공 장치(100)는 복수의 사용자 단말(200')을 통해 입력된 회의 기본정보를 저장한다. 여기서, 회의 기본정보는 회의 ID, 참석자 리스트, 회의 제목 등을 포함할 수 있다. The video conference service providing apparatus 100 stores basic conference information input through a plurality of user terminals 200'. Here, basic meeting information may include a meeting ID, a list of attendees, a meeting title, and the like.

화상 회의 서비스 제공 장치(100)는 사용자의 발언을 녹음하고, 사용자 별 화자 정보 및 녹음 파일을 저장한다. The video conference service providing apparatus 100 records a user's speech and stores speaker information and a recording file for each user.

화상 회의 서비스 제공 장치(100)는 회의가 종료되면, 회의 종료시간을 저장한다. When the conference ends, the video conference service providing device 100 stores the conference end time.

화상 회의 서비스 제공 장치(100)는 저장된 화자 정보 및 녹음 파일을 통해 회의 시간 순으로 파일을 정리한다. The video conferencing service providing apparatus 100 arranges files in order of conference time through stored speaker information and recorded files.

화상 회의 서비스 제공 장치(100)는 STT(Speech To Text) 기반으로 회의 내용을 정리한다.The video conferencing service providing device 100 arranges conference contents based on STT (Speech To Text).

화상 회의 서비스 제공 장치(100)는 AI 기반 회의 내용을 요약한다.The video conference service providing device 100 summarizes the content of the AI-based conference.

화상 회의 서비스 제공 장치(100)는 회의록을 다양한 파일 형식으로 생성한다. The video conference service providing apparatus 100 generates meeting minutes in various file formats.

화상 회의 서비스 제공 장치(100)는 생성된 회의록을 복수의 사용자 단말(200')로 전송한다.The video conference service providing device 100 transmits the generated minutes to the plurality of user terminals 200'.

이상에서, 본 발명의 실시 예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In the above, even though all the components constituting the embodiment of the present invention have been described as being combined or operated as one, the present invention is not necessarily limited to these embodiments. That is, within the scope of the object of the present invention, all of the components may be selectively combined with one or more to operate.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시 예 들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although actions are shown in a particular order in the drawings, it should not be understood that the actions must be performed in the specific order shown or in a sequential order, or that all shown actions must be performed to obtain a desired result. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of the various components in the embodiments described above should not be understood as requiring such separation, and the described program components and systems may generally be integrated together into a single software product or packaged into multiple software products. It should be understood that there is

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at mainly by its embodiments. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from a descriptive point of view rather than a limiting point of view. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the equivalent scope will be construed as being included in the present invention.

100: 화상 회의 서비스 제공 장치100: video conference service providing device

Claims

In the video conference service providing device,
Communicate with the user terminal
A communication unit for receiving video conference data, which is identification information including at least one of a user image, user voice, name, character, date of birth, and IP address;
Receiving video conference data from the user terminal to generate video conference content;
a conference content unit performing an invitation confirmation procedure for determining whether participation in the created video conference content is possible;
Determine a speaker based on the user's voice strength and voice duration included in the video conference data,
a speaker detection unit for determining the user's voice as a speaker when the voice intensity exceeds a preset value and the duration of the user's voice is longer than a preset time;
When speaker information is received from the speaker detection unit, video conference content is controlled using the speaker information;
When the speaker information is not received, an output control unit for controlling video conference content based on a manual speaker display request from the user terminal;
an interpreting/translating unit generating translated text corresponding to a language requested for interpretation/translation based on the received user voice;
Create meeting minutes using the meeting information generated during the meeting, and create edited meeting minutes according to the user's request,
When receiving a search request for conference content from the user terminal, conference information corresponding to the searched conference content is extracted to generate edited meeting minutes;
When a speaker-limited conference content search request is received from the user terminal, conference information corresponding to the selected speaker is extracted and edited meeting minutes are generated.
When receiving a search request for time-limited meeting content from the user terminal, meeting information corresponding to the searched time is extracted to generate edited meeting minutes;
When receiving a request for statistics on conference content from the user terminal, using the conference information, conference participation information is generated and editing minutes are generated.
a meeting minutes generating unit for generating editing meeting minutes using the translated text when the interpretation and translation information is received from the interpretation and translation unit;
a meeting attendance determining unit for receiving a screen of participation before and during a meeting, determining whether the same person is recognized, generating attendance and absence information based on the recognition, and providing a attendance sheet; and
A storage unit for storing text information, subtitle information, and translation information generated during the meeting, and for correcting and storing spelling and unconverted sentences.
Video conferencing service providing device comprising a.

delete

A method in which a video conference service providing device provides a video conference service,
Communicate with the user terminal
Receiving video conference data that is identification information including at least one of a user image, user voice, name, character, date of birth, and IP address;
Receiving video conference data from the user terminal to generate video conference content;
Performing an invitation confirmation procedure for determining whether participation in the created video conference content is possible;
Determine a speaker based on the user's voice strength and voice duration included in the video conference data,
determining the user's voice as a speaker when the voice intensity exceeds a preset value and the duration of the user's voice is longer than the preset time;
When speaker information is received from the speaker detection unit, video conference content is controlled using the speaker information;
controlling video conference contents based on a request for manual speaker display from the user terminal when the speaker information is not received;
generating translated text corresponding to the language requested for interpretation and translation based on the received user voice;
Create meeting minutes using the meeting information generated during the meeting, and create edited meeting minutes according to the user's request,
When receiving a search request for conference content from the user terminal, conference information corresponding to the searched conference content is extracted to generate edited meeting minutes;
When a speaker-limited conference content search request is received from the user terminal, conference information corresponding to the selected speaker is extracted and edited meeting minutes are generated.
When receiving a search request for time-limited meeting content from the user terminal, meeting information corresponding to the searched time is extracted to generate edited meeting minutes;
When receiving a request for statistics on conference content from the user terminal, using the conference information, conference participation information is generated and editing minutes are generated.
generating an editing meeting record using the translated text when interpretation information including the translated text and interpretation voice of the language requested by the user is received from the generating of the translated text;
Receiving an attendance screen before and during the meeting to check whether the same person is recognized, and generating attendance and absence information based on the recognition to provide a attendance sheet; and
Storing text information, subtitle information, and translation information generated during the meeting, and correcting and storing spelling and unconverted sentences.
Video conferencing service providing method comprising a.

delete