KR20220106615A

KR20220106615A - Apparatus and method for grpug communication

Info

Publication number: KR20220106615A
Application number: KR1020210009746A
Authority: KR
Inventors: 강경모; 이재동; 한택진
Original assignee: 주식회사 케이티
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2022-07-29

Abstract

Disclosed is a device and method for smoothly performing a group call by reducing the influence of noise in providing a group call involving many participants. According to an embodiment, a group call device for supporting a group call in which a plurality of user terminals participate comprises: a group chat room generating unit for generating a group chat room in which the user terminals participate; a receiving unit for receiving a user's voice and identification information of at least one other participant selected by the user from each of the user terminals; and a transmitting unit for mixing and synthesizing all voices received from the user terminals, attenuating them to a predetermined level, and transmitting the attenuated synthetic voice and the individual voice of at least one other participant selected by the user to each user terminal.

Description

Apparatus and method for grpug communication}

본 발명은 그룹 통화 기술에 관한 것으로, 보다 구체적으로 잡음 영향을 줄여 원활하게 그룹 통화를 할 수 있도록 하는 장치 및 방법에 관한 것이다.The present invention relates to a group call technology, and more particularly, to an apparatus and method for smoothly making a group call by reducing the influence of noise.

현재 통신 기술의 발달에 의해 원격지의 다수의 참여자가 연결되어 동시에 통화를 할 수 있는 그룹 통화(또는 회의 통화) 서비스가 제공되고 있다. 이러한 그룹 통화는 일반적으로 기업 내에서 회의를 위해 사용되고, 그룹 통화 중에 다양한 정보를 공유하기도 한다. 최근 코로나-19로 인해 대면 회의가 줄어들고 재택 근무가 늘어나면서 그룹 통화는 더욱 증가하고 있다. Currently, with the development of communication technology, a group call (or conference call) service in which a plurality of remote participants are connected and can make a call at the same time is provided. These group calls are generally used for meetings within the enterprise, and various information can be shared during the group call. Recently, as the number of face-to-face meetings and telecommuting increases due to COVID-19, group calls are on the rise.

그룹 통화는 많은 수의 사람들이 참여하여 대화를 주고 받고 최근에는 얼굴을 보며 그룹 통화를 하고 있다. 이와 같이 많은 수의 사람들이 그룹 통화에 참여하는 경우 동시에 여러 사람이 말을 할 때 하울링/에코 현상이 심하여 제대로 그룹 통화를 할 수 없는 경우가 발생한다. 따라서 처음 그룹 통화를 시작할 때 모든 참여자들이 마이크를 끄고 발언자만 마이크를 켜는 방식으로 그룹 통화를 진행하고 있어, 원활한 회의 등을 할 수 없다.In a group call, a large number of people participate to exchange a conversation, and recently, a group call is conducted face to face. When a large number of people participate in a group call as described above, when several people are talking at the same time, the howling/echo phenomenon is severe and the group call cannot be properly performed. Therefore, when starting a group call for the first time, all participants turn off their microphones and only the speaker turns on their microphones.

본 발명은 상술한 문제점을 해결하기 위해 제안된 것으로, 많은 참여자가 참여하는 그룹 통화를 제공하는데 있어서 잡음에 의한 영향을 줄여 원활하게 그룹 통화가 이루어질 수 있도록 하는 장치 및 방법을 제공하는데 그 목적이 있다.The present invention has been proposed to solve the above problems, and it is an object of the present invention to provide an apparatus and method for smoothly making a group call by reducing the influence of noise in providing a group call in which many participants participate. .

일 실시예에 따른, 복수의 사용자 단말이 참여하는 그룹 통화를 지원하는 그룹 통화 장치는, 상기 복수의 사용자 단말이 참여하는 그룹 통화방을 생성하는 그룹 통화방 생성부; 상기 복수의 사용자 단말 각각으로부터 사용자의 음성과 사용자에 의해 선택된 적어도 하나의 다른 참여자의 식별정보를 수신하는 수신부; 및 상기 복수의 사용자 단말들로부터 수신된 전체 음성을 믹싱하여 합성한 후 소정 레벨로 감쇄시키고, 감쇄된 합성 음성과 상기 사용자에 의해 선택된 적어도 하나의 다른 참여자의 개별 음성을 각 사용자 단말로 전송하는 전송부를 포함한다.According to an embodiment, a group call device supporting a group call in which a plurality of user terminals participate includes: a group call room generator configured to create a group call room in which the plurality of user terminals participate; a receiver for receiving a user's voice and identification information of at least one other participant selected by the user from each of the plurality of user terminals; and a transmission for mixing and synthesizing all voices received from the plurality of user terminals, attenuating them to a predetermined level, and transmitting the attenuated synthesized voice and the individual voices of at least one other participant selected by the user to each user terminal includes wealth.

상기 수신부는, 각 사용자 단말에 입력된 사용자의 음성이 소정의 음성 레벨 이상인 경우에 각 사용자 단말로부터 사용자의 음성을 수신할 수 있다.The receiver may receive the user's voice from each user terminal when the user's voice input to each user terminal is equal to or higher than a predetermined voice level.

상기 수신부는, 각 사용자 단말로부터 카메라로 촬영된 영상을 더 수신하고, 상기 사용자에 의해 선택된 적어도 하나의 다른 참여자는, 사용자 단말의 화면에 표시 중인 참여자이며, 상기 전송부는, 상기 사용자에 의해 선택된 적어도 하나의 다른 참여자의 사용자 단말로부터 수신된 영상을 각 사용자 단말로 전송할 수 있다.The receiving unit further receives an image captured by a camera from each user terminal, the at least one other participant selected by the user is a participant being displayed on the screen of the user terminal, and the transmitting unit is at least one selected by the user An image received from a user terminal of one other participant may be transmitted to each user terminal.

상기 수신부는, 각 사용자 단말에 입력된 사용자의 음성이 소정의 음성 레벨 이상이고 사용자의 영상에서 입 움직임이 있는 경우에 각 사용자 단말로부터 사용자의 음성을 수신할 수 있다.The receiver may receive the user's voice from each user terminal when the user's voice input to each user terminal is above a predetermined voice level and there is a mouth movement in the user's image.

일 실시예에 따른, 그룹 통화 서버에서 복수의 사용자 단말이 참여하는 그룹 통화를 지원하는 방법에 있어서, 상기 복수의 사용자 단말이 참여하는 그룹 통화방을 생성하는 단계; 상기 복수의 사용자 단말 각각으로부터 사용자의 음성과 사용자에 의해 선택된 적어도 하나의 다른 참여자의 식별정보를 수신하는 단계; 및 상기 복수의 사용자 단말들로부터 수신된 전체 음성을 믹싱하여 합성한 후 소정 레벨로 감쇄시키고, 감쇄된 합성 음성과 상기 사용자에 의해 선택된 적어도 하나의 다른 참여자의 개별 음성을 각 사용자 단말로 전송하는 단계를 포함한다.According to an embodiment, a method for supporting a group call in which a plurality of user terminals participate in a group call server, the method comprising: creating a group call room in which the plurality of user terminals participate; receiving a user's voice and identification information of at least one other participant selected by the user from each of the plurality of user terminals; and mixing and synthesizing all voices received from the plurality of user terminals, attenuating them to a predetermined level, and transmitting the attenuated synthesized voice and individual voices of at least one other participant selected by the user to each user terminal. includes

본 발명은, 그룹 통화에 참여하는 참여자가 실제로 발언을 하는 경우에만 해당 참여자의 음성이 다른 참여자들에게 전송되도록 하여, 불필요한 잡음이 다른 참여자들에게 전송되지 않도록 하여, 잡음 없이 원활한 그룹 통화를 가능하게 한다.The present invention allows the voice of the participant to be transmitted to the other participants only when the participant participating in the group call actually speaks, so that unnecessary noise is not transmitted to the other participants, thereby enabling a smooth group call without noise. do.

본 발명은, 각 참여자들에게 각 참여자들이 선택한 다른 참여자의 음성을 그대로 전송하면서 전체 참여자들의 음성은 합성하여 소정 레벨 이하로 감쇄시켜 전송함으로써 각 참여자들이 실제 그룹 통화에 참여한 것과 같은 효과를 제공한다.The present invention provides the same effect as if each participant actually participated in a group call by transmitting the voices of other participants selected by each participant to each participant as it is, and by synthesizing the voices of all participants and attenuating them to a predetermined level or less.

도 1은 본 발명의 일 실시예에 따른 그룹 통화 시스템을 설명하기 위한 구성도이다.
도 2는 본 발명의 일 실시예에 따른 사용자 단말의 그룹 통화 어플리케이션을 상세한 구성을 나타낸 도면이다.
도 3은 본 발명의 일 실시예에 따른 그룹 통화 서버의 구성을 나타낸 도면이다.
도 4는 본 발명의 일 실시예에 따른 사용자 단말에서 그룹 통화를 위한 음성 및 영상을 전송하는 방법을 설명하는 흐름도이다.
도 5는 본 발명의 일 실시예에 따른 그룹 통화 서버에서 각 참여자의 사용자 단말로 음성을 전송하는 방법을 설명하는 흐름도이다.1 is a block diagram illustrating a group call system according to an embodiment of the present invention.
2 is a diagram illustrating a detailed configuration of a group call application of a user terminal according to an embodiment of the present invention.
3 is a diagram showing the configuration of a group call server according to an embodiment of the present invention.
4 is a flowchart illustrating a method of transmitting voice and video for a group call in a user terminal according to an embodiment of the present invention.
5 is a flowchart illustrating a method of transmitting a voice from a group call server to a user terminal of each participant according to an embodiment of the present invention.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일 실시예를 상세히 설명하기로 한다.The above-described objects, features, and advantages will become more apparent through the following detailed description in relation to the accompanying drawings, and accordingly, those of ordinary skill in the art to which the present invention pertains can easily implement the technical idea of the present invention. There will be. In addition, in the description of the present invention, when it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 그룹 통화 시스템을 설명하기 위한 구성도이다. 도 1을 참조하면, 그룹 통화 시스템은 사용자 단말 단말(110), 그룹 통화 서버(120) 및 이들을 연결하는 통신망(130)을 포함한다. 여기서 통신망(130)은, 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 통신망(130)의 일 예는, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 3G, 4G, LTE, VoLTE 등이 포함되나 이에 한정되지는 않는다. 1 is a block diagram illustrating a group call system according to an embodiment of the present invention. Referring to FIG. 1 , a group call system includes a user terminal terminal 110 , a group call server 120 , and a communication network 130 connecting them. Here, the communication network 130 refers to a connection structure capable of exchanging information between nodes such as terminals and servers, and an example of such a communication network 130 is the Internet, a local area network (LAN). ), Wireless Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), 3G, 4G, LTE, VoLTE, and the like.

도 1을 참조하면, 사용자 단말(110)은 통신망(130)을 통하여 원격지의 그룹 통화 서버(120)에 접속할 수 있는 통신 기기이다. 사용자 단말(110)은 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(smartphone), 스마트 패드(smartpad), 태블릿 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있고, 또한 퍼스널 컴퓨터 등의 고정형 통신 단말도 포함할 수 있다. Referring to FIG. 1 , a user terminal 110 is a communication device capable of accessing a group call server 120 at a remote location through a communication network 130 . The user terminal 110 is, for example, a wireless communication device that ensures portability and mobility, and includes a Personal Communication System (PCS), a Global System for Mobile communications (GSM), a Personal Digital Cellular (PDC), and a Personal Handyphone System (PHS). , PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) terminal, smart phone ( smartphone), smart pad, tablet PC, etc., may include all kinds of handheld-based wireless communication devices, and may also include fixed communication terminals such as personal computers.

사용자 단말(110)은, 그룹 통화를 위한 어플리케이션을 포함할 수 있고, 그룹 통화를 위해 마이크와 스피커를 포함하며, 통신을 위한 통신 모듈을 포함한다. 마이크를 통해 수신된 사용자의 음성을 통신망(130)을 통해 그룹 통화 서버(120)로 전송하고, 그룹 통화 서버(120)로부터 수신된 다른 참여자의 음성을 수신하여 스피커로 출력할 수 있다. 또한, 사용자 단말(110)은, 촬영을 위한 카메라를 포함하고, 카메라에서 촬영된 사용자의 영상을 통신망(130)을 통해 그룹 통화 서버(120)로 전송하고, 그룹 통화 서버(120)로부터 수신된 다른 참여자의 영상을 수신하여 디스플레이 장치를 통해 출력할 수 있다. The user terminal 110 may include an application for a group call, includes a microphone and a speaker for the group call, and includes a communication module for communication. The user's voice received through the microphone may be transmitted to the group call server 120 through the communication network 130 , and the voice of another participant received from the group call server 120 may be received and output to the speaker. In addition, the user terminal 110 includes a camera for photographing, and transmits the user's image captured by the camera to the group call server 120 through the communication network 130 , and received from the group call server 120 . An image of another participant may be received and output through a display device.

사용자 단말(110)은, 사용자로부터 입력되는 음성과 카메라로 촬영되는 영상을 분석하여 음성 전송 여부를 결정할 수 있다. 바람직하게, 사용자 단말(110)은, 마이크를 통해 입력되는 사용자의 음성을 분석하여 사용자의 음성이 임계 음성 레벨 이상이고, 또한 카메라로부터 수신되는 사용자의 영상을 분석하여 사용자의 입이 움직이는 것으로 판단되는 경우, 사용자의 음성과 영상을 그룹 통화 서버(120)로 전송하고, 사용자의 음성이 임계 음성 레벨 미만이거나 사용자의 입이 움직이지 않는 경우 사용자의 음성은 전송하지 않고 영상만을 그룹 통화 서버(120)로 전송할 수 있다. 즉, 사용자가 발화를 하는 경우에만 음성을 전송함으로써 불필요한 사용자 주변의 잡음이 다른 참여자들에게 전송되지 않도록 한다.The user terminal 110 may determine whether to transmit the voice by analyzing the voice input from the user and the image captured by the camera. Preferably, the user terminal 110 analyzes the user's voice input through the microphone so that the user's voice is above a threshold voice level, and analyzes the user's image received from the camera to determine that the user's mouth is moving In this case, the user's voice and video are transmitted to the group call server 120, and when the user's voice is less than the threshold voice level or the user's mouth does not move, the user's voice is not transmitted and only the video is transmitted to the group call server 120 can be sent to That is, by transmitting the voice only when the user speaks, unnecessary noise around the user is prevented from being transmitted to other participants.

사용자 단말(110)에 설치되는 그룹 통화 어플리케이션은, 그룹 통화를 위한 화면 인터페이스를 제공한다. 그룹 통화를 위한 화면 인터페이스는, 하나의 화면을 복수의 영역으로 분할하고, 각 분할 영역에 다른 참여자들의 영상을 각각 표시할 수 있다. 그룹 통화 어플리케이션은, 설정 정보에 따라, 하나의 화면을 미리 정해진 최대 개수만큼까지만 분할할 수 있다. 예를 들어, 그룹 통화에 총 100명이 참여하고, 그룹 통화 어플리케이션이 하나의 화면을 최대 10개의 분할 영역으로 분할할 수 있다면, 10개의 페이지 각각에 10명씩의 참여자들만 화면에 표시하고, 사용자의 입력에 따라 페이지를 넘기는 형태로 화면 인터페이스를 제공한다. The group call application installed in the user terminal 110 provides a screen interface for the group call. The screen interface for a group call may divide one screen into a plurality of areas, and may display images of other participants in each divided area. The group call application may divide one screen up to a predetermined maximum number according to the setting information. For example, if a total of 100 people participate in a group call and the group call application can divide one screen into up to 10 partitions, only 10 participants on each of 10 pages are displayed on the screen, and the user's input A screen interface is provided in the form of turning pages according to

그룹 통화 서버(120)는 복수의 사용자 단말(110) 간 음성 그룹 통화 또는 음성과 영상이 결합된 영상 그룹 통화 서비스를 제공한다. 그룹 통화 서버(120)는 그룹 통화 참여자의 초대 또는 퇴장 등의 처리를 수행하고, 그룹 통화 중 파일, 멀티미디어 등의 데이터 송수신 기능을 제공할 수 있다. 그룹 통화 서버(120)는 그룹 통화 요청이 수신되면 그룹 통화방을 생성하고 복수의 사용자 단말(110) 간의 통화 서비스를 제공한다. 그룹 통화 서버(120)는 그룹 통화를 위한 그룹 통화방을 생성할 때 보안 그룹 통화를 위한 보안 통화 키를 생성하여 그룹 통화에 참여한 사용자 단말(110)들로 전송하여, 각 사용자 단말(110)이 그룹 통화 서버(120)로부터 수신된 보안 통화 키를 이용하여 통화 데이터를 암호화하여 송수신하도록 할 수 있다. The group call server 120 provides a voice group call between a plurality of user terminals 110 or a video group call service in which voice and video are combined. The group call server 120 may perform a process such as invitation or exit of a group call participant, and may provide a function of transmitting and receiving data such as files and multimedia during a group call. When a group call request is received, the group call server 120 creates a group call room and provides a call service between the plurality of user terminals 110 . The group call server 120 generates a secure call key for the secure group call when creating a group call room for the group call and transmits it to the user terminals 110 participating in the group call, so that each user terminal 110 is Using the secure call key received from the group call server 120, the call data may be encrypted and transmitted/received.

그룹 통화 서버(120)는, 그룹 통화에 참여하는 참여자들의 식별정보를 저장 및 관리한다. 그룹 통화 서버(120)는, 각 사용자 단말(110)로부터 해당 사용자 단말(110)로부터 사용자에 의해 선택된 다른 참여자들의 식별정보를 수신하면, 전체 참여자들의 음성을 합성하고 소정 레벨로 감쇄시킨 합성 음성과, 상기 사용자에 의해 선택된 참여자들의 식별정보에 대응하는 각 참여자들의 개별 음성을 해당 사용자 단말(110)로 전송한다. 따라서, 각 사용자 단말(110)의 사용자는 사용자에 의해 선택된 참여자들의 음성은 또렷하게 들을 수 있고 이외의 음성은 작게 들려, 실제 현장에 있는 느낌을 받을 수 있고 필요한 사람들의 음성만을 들을 수 있다. 바람직하게, 그룹 통화 서버(120)는, 각 사용자 단말(110)로부터 각 사용자 단말(110)의 현재 화면에 표시된 참여자들의 식별정보를, 상기 사용자에 의해 선택된 참여자들의 식별정보로서 수신할 수 있다. The group call server 120 stores and manages identification information of participants participating in the group call. The group call server 120, upon receiving the identification information of other participants selected by the user from the corresponding user terminal 110 from each user terminal 110, synthesizes the voices of all participants and attenuates the synthesized voice to a predetermined level and , transmits the individual voices of each participant corresponding to the identification information of the participants selected by the user to the corresponding user terminal 110 . Accordingly, the user of each user terminal 110 can clearly hear the voices of the participants selected by the user and the other voices are small, so that the user of each user terminal 110 can feel as if he is actually in the field and can hear only the voices of people who need it. Preferably, the group call server 120 may receive the identification information of the participants displayed on the current screen of each user terminal 110 from each user terminal 110 as identification information of the participants selected by the user.

도 2는 본 발명의 일 실시예에 따른 사용자 단말의 그룹 통화 어플리케이션을 상세한 구성을 나타낸 도면이다. 도 2를 참조하면, 그룹 통화 어플리케이션(200)은, 사용자 단말의 메모리에 설치되고, 적어도 하나의 프로세서에 의해 동작할 수 있다. 도 2에 도시된 바와 같이, 그룹 통화 어플리케이션(200)은, 수신부(210), 출력부(220), 분석부(230) 및 전송부(240)를 포함한다. 2 is a diagram illustrating a detailed configuration of a group call application of a user terminal according to an embodiment of the present invention. Referring to FIG. 2 , the group call application 200 may be installed in the memory of the user terminal and operated by at least one processor. As shown in FIG. 2 , the group call application 200 includes a receiving unit 210 , an output unit 220 , an analyzing unit 230 , and a transmitting unit 240 .

수신부(210)는, 사용자 단말(110)의 마이크로부터 음성을 수신하고, 카메라로부터 영상을 수신한다. 또한, 수신부(210)는 통신망(130)을 통해 그룹 통화 서버(120)로부터 그룹 통화의 다른 참여자들의 음성과 영상을 수신한다. 그룹 통화 서버(120)로부터 수신되는 음성은, 그룹 통화에 참여하는 전체 참여자들의 음성이 합성된 합성 음성과, 디스플레이 장치의 화면에 표시 중인 참여자들 각각의 개별 음성을 포함한다. The receiver 210 receives an audio from a microphone of the user terminal 110 and an image from a camera. In addition, the receiver 210 receives voices and images of other participants in the group call from the group call server 120 through the communication network 130 . The voice received from the group call server 120 includes a synthesized voice in which the voices of all participants participating in the group call are synthesized, and individual voices of each of the participants being displayed on the screen of the display device.

출력부(220)는, 상기 수신부(210)로부터 사용자의 영상을 수신하여 디스플레이 장치에 출력할 수 있고, 또한 상기 수신부(210)로부터 그룹 통화의 다른 참여자들의 음성과 영상을 수신하여 음성은 스피커를 통해 출력하고 영상은 디스플레이 장치에 출력할 수 있다. 바람직하게, 출력부(220)는, 사용자의 입력에 따라 디스플레이 장치의 화면을 복수의 영역으로 분할하고 각 분할 영역마다 다른 참여자의 영상을 표시할 수 있다. The output unit 220 may receive the user's image from the receiving unit 210 and output it to the display device, and also receive the voices and images of other participants in the group call from the receiving unit 210 so that the voice is transmitted to the speaker. and the image may be output to a display device. Preferably, the output unit 220 may divide the screen of the display apparatus into a plurality of regions according to a user's input and display images of different participants in each divided region.

분석부(230)는, 상기 수신부(210)로부터 사용자의 음성과 영상을 수신하여, 사용자가 발언을 하고 있는지 분석한다. 구체적으로, 분석부(230)는, 사용자의 음성을 분석하여 사용자의 음성이 임계 음성 레벨 이상이고, 또한 사용자의 영상을 분석하여 사용자의 입이 움직이는 것으로 판단되는 경우, 사용자의 음성과 영상을 전송부(240)로 전송하고, 사용자의 음성이 임계 음성 레벨 미만이거나 사용자의 입이 움직이지 않는 경우 사용자의 음성은 전송하지 않고 영상만을 전송부(240)로 전송할 수 있다.The analysis unit 230 receives the user's voice and image from the receiving unit 210 and analyzes whether the user is speaking. Specifically, the analysis unit 230 analyzes the user's voice and transmits the user's voice and image when it is determined that the user's voice is above a threshold voice level and the user's mouth is moving by analyzing the user's image If the user's voice is less than the threshold voice level or the user's mouth does not move, only the image may be transmitted to the transmitter 240 without transmitting the user's voice.

전송부(240)는, 상기 분석부(230)로부터 수신되는 사용자의 음성, 또는 사용자의 음성 및 영상을 통신망(130)을 통해 그룹 통화 서버(120)로 전송한다. 전송부(240)는, 음성 및 영상 데이터를 전송할 때, 사용자의 식별정보를 포함하여 전송할 수 있다. 또한, 전송부(240)는, 출력부(220)에 의해 디스플레이 장치의 화면에 표시 중인 참여자들의 식별정보를 그룹 통화 서버(120)로 전송할 수 있다.The transmitting unit 240 transmits the user's voice or the user's voice and video received from the analyzing unit 230 to the group call server 120 through the communication network 130 . When transmitting audio and video data, the transmission unit 240 may include user identification information. Also, the transmitter 240 may transmit the identification information of the participants displayed on the screen of the display device by the output unit 220 to the group call server 120 .

도 3은 본 발명의 일 실시예에 따른 그룹 통화 서버의 구성을 나타낸 도면이다. 그룹 통화 서버(120)는 메모리, 하나 이상의 프로세서(CPU), 통신 회로를 포함할 수 있다. 이러한 구성요소는 하나 이상의 통신 버스 또는 신호선을 통하여 통신한다. 도 3을 참조하면, 그룹 통화 서버(120)는, 그룹 통화방 생성부(310), 수신부(320) 및 전송부(330)를 포함하고, 이들은 소프트웨어로 구현되어 메모리에 저장되고 하나 이상의 프로세서에 의해 구동될 수 있고, 또는 소프트웨어와 하드웨어의 조합으로 구현될 수도 있다.3 is a diagram showing the configuration of a group call server according to an embodiment of the present invention. The group call server 120 may include a memory, one or more processors (CPUs), and communication circuitry. These components communicate via one or more communication buses or signal lines. Referring to FIG. 3 , the group call server 120 includes a group call room generating unit 310 , a receiving unit 320 , and a transmitting unit 330 , which are implemented as software, are stored in a memory, and are stored in one or more processors. It may be driven by the software, or may be implemented by a combination of software and hardware.

그룹 통화방 생성부(310)는, 다수의 참여자들이 참여하는 그룹 통화방을 생성한다. 그룹 통화방 생성부(310)는, 어느 하나의 사용자 단말(110)로부터 그룹 통화방 생성 요청을 수신하여 그룹 통화방을 생성하고, 다른 참여자들의 사용자 단말(110)이 접속하면 해당 그룹 통화방에 입장시킨다. 그룹 통화방 생성부(310)는, 그룹 통화방을 생성한 사용자 단말(110)의 요청에 따라 다른 참여자들의 사용자 단말(110)로 초대 링크를 포함하는 초대 메시지를 전송하여 다른 참여자들을 초대할 수 있다. 여기서 그룹 통화방은 하나의 세션이 생성되고 각 사용자 단말(110)이 하나의 세션에 연결되는 것을 의미할 수 있다. 그룹 통화방 생성부(310)는 그룹 통화방의 식별정보에 참여자들의 식별정보를 매칭하여 관리한다. The group chat room generation unit 310 creates a group chat room in which a plurality of participants participate. The group chat room generation unit 310 receives a group chat room creation request from any one user terminal 110 to create a group chat room, and when the user terminals 110 of other participants connect to the group chat room, let in The group chat room generation unit 310 may invite other participants by sending an invitation message including an invitation link to the user terminals 110 of other participants according to the request of the user terminal 110 that has created the group chat room. have. Here, the group call room may mean that one session is created and each user terminal 110 is connected to one session. The group chat room generation unit 310 manages by matching the identification information of the participants with the identification information of the group chat room.

수신부(320)는, 통신망(130)을 통해 그룹 통화에 참여한 참여자들의 사용자 단말(110)로부터 그룹 통화를 위한 음성 및/또는 영상을 수신한다. 바람직하게, 사용자 단말(110)로부터 수신되는 음성 및/또는 영상 데이터에는 참여자들의 식별정보가 포함되어, 음성 및/또는 영상 데이터가 어느 참여자로부터 수신되었는지 확인된다. 또한, 수신부(320)는, 각 사용자 단말(110)로부터 각 사용자 단말(110)의 화면에 표시 중인 참여자들의 식별정보를 별도로 수신할 수 있다. The receiver 320 receives a voice and/or video for the group call from the user terminals 110 of the participants participating in the group call through the communication network 130 . Preferably, the audio and/or video data received from the user terminal 110 includes identification information of the participants, so that it is confirmed from which participant the audio and/or video data was received. In addition, the receiver 320 may separately receive identification information of participants being displayed on the screen of each user terminal 110 from each user terminal 110 .

전송부(330)는, 상기 수신부(320)에서 수신된 각 참여자들의 음성 및/또는 영상을 다른 참여자들의 사용자 단말(110)로 전송한다. 예를 들어, 100명의 참여자들이 그룹 통화를 할 때, 제1참여자의 사용자 단말(110)로는 나머지 99명의 참여자들의 음성 및/또는 영상을 전송한다. The transmitter 330 transmits the voice and/or video of each participant received by the receiver 320 to the user terminal 110 of the other participants. For example, when 100 participants make a group call, voices and/or images of the remaining 99 participants are transmitted to the user terminal 110 of the first participant.

전송부(330)는, 각 참여자들의 사용자 단말(110)로 다른 참여자들의 음성 및/또는 영상을 전송하는데 있어서, 각 사용자 단말(110)로부터 수신된 현재 화면에 표시 중인 참여자들의 식별정보에 대응하는 참여자들의 음성 및/또는 영상을 각각 개별적으로 전송한다. 이때, 전송부(330)는, 참여자들의 음성을 개별적으로 전송하는데 있어서 그룹 통화에 참여하는 모든 참여자들의 음성을 전부 합성한 후 소정 레벨로 감쇄시킨 전체 합성 음성을 각 참여자의 사용자 단말(110)로 전송한다. The transmitter 330 transmits the voices and/or images of other participants to the user terminals 110 of each participant, corresponding to the identification information of the participants currently being displayed on the screen received from each user terminal 110 . Each participant's voice and/or video is transmitted individually. At this time, in transmitting the voices of the participants individually, the transmitter 330 synthesizes all the voices of all the participants participating in the group call and then transmits the synthesized voices attenuated to a predetermined level to the user terminal 110 of each participant. send.

도 4는 본 발명의 일 실시예에 따른 사용자 단말에서 그룹 통화를 위한 음성 및 영상을 전송하는 방법을 설명하는 흐름도이다. 4 is a flowchart illustrating a method of transmitting voice and video for a group call in a user terminal according to an embodiment of the present invention.

도 4를 참조하면, 단계 S401에서, 사용자 단말(110)은, 마이크를 통해 사용자의 음성을 수신하고, 카메라로부터 사용자의 영상을 수신한다. Referring to FIG. 4 , in step S401 , the user terminal 110 receives the user's voice through a microphone and receives the user's image from the camera.

단계 S402에서, 사용자 단말(110)은, 수신된 음성을 분석하여 수신된 사용자의 음성이 임계 음성 레벨 이상인지 확인한다. 만약, 사용자의 음성이 임계 음성 레벨 이상인 경우, 단계 S405에서, 사용자 단말(110)은, 사용자의 음성은 그룹 통화 서버(120)로 전송하지 않고 사용자의 영상만 그룹 통화 서버(120)로 전송한다. In step S402, the user terminal 110 analyzes the received voice to determine whether the received user's voice is equal to or greater than a threshold voice level. If the user's voice is above the threshold voice level, in step S405 , the user terminal 110 transmits only the user's video to the group call server 120 without transmitting the user's voice to the group call server 120 . .

한편, 단계 S402에서 확인한 결과, 사용자의 음성이 임계 음성 미만인 경우, 단계 S403에서, 사용자 단말(110)은, 사용자의 영상을 분석하여 사용자의 입이 움직이는지 확인한다. 만약, 사용자의 입이 움직이지 않는 경우, 단계 S405에서, 사용자 단말(110)은, 사용자의 음성은 그룹 통화 서버(120)로 전송하지 않고 사용자의 영상만 그룹 통화 서버(120)로 전송한다.Meanwhile, as a result of checking in step S402, if the user's voice is less than the threshold voice, in step S403, the user terminal 110 analyzes the user's image to determine whether the user's mouth is moving. If the user's mouth does not move, in step S405 , the user terminal 110 transmits only the user's video to the group call server 120 without transmitting the user's voice to the group call server 120 .

단계 S403에서 확인한 결과, 사용자의 입이 움직이는 경우, 단계 S404에서, 사용자 단말(110)은, 사용자의 음성 및 영상을 함께 그룹 통화 서버(120)로 전송한다. As a result of checking in step S403 , if the user's mouth moves, in step S404 , the user terminal 110 transmits the user's voice and video together to the group call server 120 .

이상의 도 4를 참조하여 설명한 실시예에 따르면, 사용자가 발언을 할 때만 음성을 전송하고 사용자가 발언을 하지 않을 때는 음성이 전송되지 않도록 하여 그룹 통화에 참여한 다른 참여자들에게 불필요하게 잡음이 전송되지 않도록 한다. 따라서 원활한 그룹 통화를 가능하게 한다.According to the embodiment described above with reference to FIG. 4, the voice is transmitted only when the user speaks and the voice is not transmitted when the user does not speak, so that noise is not transmitted to other participants in the group call unnecessarily. do. This enables smooth group calls.

도 5는 본 발명의 일 실시예에 따른 그룹 통화 서버에서 각 참여자의 사용자 단말로 음성을 전송하는 방법을 설명하는 흐름도이다.5 is a flowchart illustrating a method of transmitting a voice from a group call server to a user terminal of each participant according to an embodiment of the present invention.

도 5를 참조하면, 단계 S501에서, 그룹 통화 서버(120)는, 제1 사용자 단말(110)로부터 해당 사용자 단말(110)의 화면에 표시 중인 참여자들의 식별정보(예, ID)를 수신한다. Referring to FIG. 5 , in step S501 , the group call server 120 receives identification information (eg, IDs) of participants displayed on the screen of the corresponding user terminal 110 from the first user terminal 110 .

단계 S502에서, 그룹 통화 서버(120)는, 상기 제1 사용자 단말(110)의 사용자를 포함한 그룹 통화에 참여한 모든 참여자들의 음성을 믹싱하여 합성하고 합성 음성을 소정의 음성 레벨로 감쇄시킨다. In step S502, the group call server 120 mixes and synthesizes the voices of all participants participating in the group call, including the user of the first user terminal 110, and attenuates the synthesized voice to a predetermined voice level.

단계 S503에서, 그룹 통화 서버(120)는, 상기 감쇄된 합성 음성과, 상기 단계 S501에서 수신된 참여자들의 식별정보에 대응하는 각 참여자의 음성을 상기 제1 사용자 단말(110)로 전송한다. In step S503 , the group call server 120 transmits the attenuated synthesized voice and the voice of each participant corresponding to the identification information of the participants received in step S501 to the first user terminal 110 .

이상의 도 5를 참조하여 설명한 실시예에 따르면, 제1 사용자 단말(110)의 사용자는, 그룹 통화에 참여한 전체 참여자들의 감쇄된 음성을 크지 않은 소리로 배경음처럼 들으면서, 현재 제1 사용자 단말(110)의 화면에 표시 중인 참여자들의 음성은 감쇄 없이 원래의 음성을 들을 수 있어, 실제 그룹 회의에 참석한 것과 같은 느낌을 받을 수 있고, 원하는 참여자들의 음성을 정확하게 들을 수 있다.According to the embodiment described above with reference to FIG. 5 , the user of the first user terminal 110 listens to the attenuated voices of all participants participating in the group call as background sound in a low volume while the current first user terminal 110 . You can hear the original voices without attenuation of the voices of the participants being displayed on the screen of the .

본 명세서는 많은 특징을 포함하는 반면, 그러한 특징은 본 발명의 범위 또는 특허청구범위를 제한하는 것으로 해석되어서는 안 된다. 또한, 본 명세서에서 개별적인 실시예에서 설명된 특징들은 단일 실시예에서 결합되어 구현될 수 있다. 반대로, 본 명세서에서 단일 실시예에서 설명된 다양한 특징들은 개별적으로 다양한 실시예에서 구현되거나, 적절히 결합되어 구현될 수 있다.While this specification contains many features, such features should not be construed as limiting the scope of the invention or the claims. Also, features described in individual embodiments herein may be implemented in combination in a single embodiment. Conversely, various features described herein in a single embodiment may be implemented in various embodiments individually, or may be implemented in appropriate combination.

도면에서 동작들이 특정한 순서로 설명되었으나, 그러한 동작들이 도시된 바와 같은 특정한 순서로 수행되는 것으로, 또는 일련의 연속된 순서, 또는 원하는 결과를 얻기 위해 모든 설명된 동작이 수행되는 것으로 이해되어서는 안 된다. 특정 환경에서 멀티태스킹 및 병렬 프로세싱이 유리할 수 있다. 아울러, 상술한 실시예에서 다양한 시스템 구성요소의 구분은 모든 실시예에서 그러한 구분을 요구하지 않는 것으로 이해되어야 한다. 상술한 프로그램 구성요소 및 시스템은 일반적으로 단일 소프트웨어 제품 또는 멀티플 소프트웨어 제품에 패키지로 구현될 수 있다.Although acts are described in a particular order in the drawings, it should not be understood that such acts are performed in the particular order as shown, or that all of the described acts are performed in a continuous order, or to obtain a desired result. . Multitasking and parallel processing may be advantageous in certain circumstances. In addition, it should be understood that the division of various system components in the above-described embodiments does not require such division in all embodiments. The program components and systems described above may generally be implemented as a package in a single software product or multiple software products.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(시디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다. 이러한 과정은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있으므로 더 이상 상세히 설명하지 않기로 한다.The method of the present invention as described above may be implemented as a program and stored in a computer-readable form in a recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.). Since this process can be easily performed by a person of ordinary skill in the art to which the present invention pertains, it will not be described in detail any longer.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above, for those of ordinary skill in the art to which the present invention pertains, various substitutions, modifications and changes are possible without departing from the technical spirit of the present invention. It is not limited by the drawings.

110 : 사용자 단말
120 : 그룹 통화 서버
130 : 통신망
210 : 수신부
220 : 출력부
230 : 분석부
240 : 전송부
310 : 그룹 통화 생성부
320 : 수신부
330 : 전송부110: user terminal
120: group call server
130: communication network
210: receiver
220: output unit
230: analysis unit
240: transmission unit
310: group call generation unit
320: receiver
330: transmission unit

Claims

In a group call device supporting a group call in which a plurality of user terminals participate,
a group call room generation unit for generating a group call room in which the plurality of user terminals participate;
a receiver for receiving the user's voice and identification information of at least one other participant selected by the user from each of the plurality of user terminals; and
A transmission unit for mixing and synthesizing all voices received from the plurality of user terminals, attenuating them to a predetermined level, and transmitting the attenuated synthesized voice and the individual voices of at least one other participant selected by the user to each user terminal Including group call devices.

According to claim 1,
The receiving unit,
A group call apparatus, characterized in that the user's voice is received from each user terminal when the user's voice input to each user terminal is above a predetermined voice level.

3. The method of claim 2,
The receiving unit further receives the image taken by the camera from each user terminal,
At least one other participant selected by the user is a participant being displayed on the screen of the user terminal,
The transmitting unit, the group call device, characterized in that for transmitting the image received from the user terminal of the at least one other participant selected by the user to each user terminal.

4. The method of claim 3,
The receiving unit,
A group call device, characterized in that the user's voice is received from each user terminal when the user's voice input to each user terminal is above a predetermined voice level and there is a mouth movement in the user's video.