KR101334015B1

KR101334015B1 - Portable terminal having function of classifying speaker during multilateral image communication and method for classifying speaker during multilateral image communication

Info

Publication number: KR101334015B1
Application number: KR1020070101277A
Authority: KR
Inventors: 김정호; 김미영
Original assignee: 주식회사 케이티
Priority date: 2007-10-09
Filing date: 2007-10-09
Publication date: 2013-11-28
Also published as: KR20090036226A

Abstract

다자간 화상 통화시 화자의 영상을 용이하게 구분할 수 있는 다자간 화상 통화시 화자 구분 기능을 구비한 휴대용 단말기 및 다자간 화상 통화시 화자 구분 방법이 개시된다. 다자간 화상 통화에 참여하는 각각의 화자에 상응하는 음량 가중치 정보가 저장된 저장부와, 소정의 화자로부터 음성 데이터가 제공되면 소정 화자에 상응하는 음량 가중치를 저장부로부터 독출한 후 독출된 음량 가중치에 기초하여 음량 제어 신호를 생성하는 제어부 및 음량 제어 신호에 기초하여 소정 화자의 음성이 출력되는 좌측 채널의 음량 및 우측 채널의 음량을 각각 조절하는 오디오 코덱을 포함한다. 따라서, 헤드셋을 통해 화자의 음성을 청취하는 것만으로 디스플레이부에 표시된 복수의 화상 통화 참여자의 영상 중 화자의 영상을 용이하게 구분할 수 있다.Disclosed are a portable terminal having a speaker discrimination function for a multi-party video call and a speaker discrimination method for a multi-party video call. A storage unit storing volume weight information corresponding to each speaker participating in the multi-party video call, and based on the volume weight read after reading the volume weight corresponding to the predetermined speaker from the storage unit when the voice data is provided from the predetermined speaker. And a control unit for generating a volume control signal, and an audio codec for adjusting the volume of the left channel and the volume of the right channel, respectively, based on the volume control signal. Therefore, the video of the speaker can be easily distinguished from the video of the plurality of video call participants displayed on the display unit only by listening to the voice of the speaker through the headset.

화상 통화, 다자간, 음성, 구분 Video call, multi-party, voice, separate

Description

PORTABLE TERMINAL HAVING FUNCTION OF CLASSIFYING SPEAKER DURING MULTILATERAL IMAGE COMMUNICATION AND METHOD FOR CLASSIFYING SPEAKER DURING MULTILATERAL IMAGE COMMUNICATION}

본 발명은 휴대용 단말기에 관한 것으로, 더욱 상세하게는 다자간 화상 통화 기능을 구비한 휴대용 단말기에 적용될 수 있는 다자간 화상 통화시 화자 구분 기능을 구비한 휴대용 단말기 및 다자간 화상 통화시 화자 구분 방법에 관한 것이다.The present invention relates to a portable terminal, and more particularly, to a portable terminal having a speaker discrimination function for a multi-party video call and a speaker discrimination method for a multi-party video call, which can be applied to a portable terminal having a multi-party video call function.

이동 통신 기술의 세대 진화로 인해 본격적인 화상 통화가 상용화되면서 기존의 음성 통화 및 단문 메시지 송수신 위주의 이동통신 서비스에서 벗어나 장소에 관계없이 휴대용 단말기를 이용하여 상대방의 모습을 보면서 통화할 수 있게 되었다.With the evolution of mobile communication technology, full-scale video telephony has become commercially available, and it is now possible to make a call while looking at the other party using a portable terminal regardless of a place, away from the existing voice communication and mobile communication service focused on short message transmission.

또한, 최근에는 적어도 두 명 이상이 동시에 화상 통화에 참여하여 서로 상대방의 모습을 보면서 통화할 수 있는 다자간 화상 통화가 가능하게 되어 각각 다른 장소에 있는 사람들이 화상 통화를 통해 개인적인 친목을 도모하거나, 다자간 화상 통화를 원격 화상 회의 등과 같은 업무용 목적으로 사용하고 있다.In addition, at least two people have been able to participate in a video call at the same time and make a multi-party video call where they can see each other while viewing each other. Video calls are used for business purposes such as teleconferences.

일반적인 다자간 화상 통화에서는 화상 통화에 참여하는 모든 참여자의 영상이 휴대용 단말기의 디스플레이부의 구분된 영역에 표시되고, 휴대용 단말기에 구비된 스피커 또는 헤드셋을 통해 모든 화상 통화 참여자의 음성이 동일한 크기로 출력됨으로써 다자간 화상 통화가 수행된다.In a typical multi-party video call, the video of all participants participating in the video call is displayed in a separate area of the display of the mobile terminal, and the voices of all the video call participants are output in the same size through a speaker or a headset provided in the mobile terminal. Video calls are made.

상기와 같은 종래의 다자간 화상 통화에서는 휴대용 단말기의 디스플레이부에 표시 영역을 구분하여 화상 통화에 참여하는 각 참여자의 영상을 각각 다른 표시 영역에 표시하기 때문에 각 참여자의 영상을 용이하게 구분할 수 있으나, 상기 영상과 동기된 각 참여자의 음성은 스피커 또는 헤드셋을 통해 모두 동일한 크기의 음량으로 출력되기 때문에 출력되는 음성에 상응하는 화자의 영상을 구분하기가 용이하지 않다는 단점이 있다.In the conventional multi-party video call as described above, since the display area of the portable terminal divides the display area and displays the video of each participant participating in the video call on a different display area, the video of each participant can be easily distinguished. Since the voice of each participant synchronized with the video is output at the same volume through the speaker or the headset, it is difficult to distinguish the video of the speaker corresponding to the output voice.

따라서, 본 발명의 제1 목적은 다자간 화상 통화시 화자의 영상을 용이하게 구분할 수 있는 다자간 화상 통화시 화자 구분 기능을 구비한 휴대용 단말기를 제공하는 것이다.Accordingly, a first object of the present invention is to provide a portable terminal having a speaker discrimination function during a multi-party video call, which can easily distinguish the video of the speaker during the multi-party video call.

또한, 본 발명의 제2 목적은 다자간 화상 통화시 화자의 영상을 용이하게 구분할 수 있는 다자간 화상 통화시 화자 구분 방법을 제공하는 것이다.In addition, a second object of the present invention is to provide a method for distinguishing a speaker during a multi-party video call, which can easily distinguish the video of the speaker during the multi-party video call.

상술한 본 발명의 제1 목적을 달성하기 위한 본 발명의 일측면에 따른 다자간 화상 통화시 화자 구분 기능을 구비한 휴대용 단말기는 다자간 화상 통화에 참여하는 각각의 화자에 상응하는 음량 가중치 정보가 저장된 저장부와, 소정의 화자로부터 음성 데이터가 제공되면 상기 소정 화자에 상응하는 음량 가중치를 상기 저장부로부터 독출한 후 독출된 상기 음량 가중치에 기초하여 음량 제어 신호를 생성하는 제어부 및 상기 음량 제어 신호에 기초하여 상기 소정 화자의 음성이 출력되 는 좌측 채널의 음량 및 우측 채널의 음량을 각각 조절하는 오디오 코덱을 포함한다. 상기 음량 가중치는 상기 소정 화자의 음성이 출력되는 좌측 채널 및 우측 채널에 대해 각각 설정될 수 있다. 상기 제어부는 상기 소정의 화자로부터 음성 데이터가 제공되면 상기 음성 데이터에 포함된 호 구분자를 추출하여 상기 다자간 화상 통화에 참여하는 각각의 화자 중에서 상기 소정의 화자를 식별할 수 있다. 상기 오디오 코덱은 상기 음량 제어 신호에 기초하여 상기 좌측 채널의 음량을 조절하는 제1 증폭부 및 상기 음량 제어 신호에 기초하여 상기 우측 채널의 음량을 조절하는 제2 증폭부를 포함할 수 있다. 상기 다자간 화상 통화시 화자 구분 기능을 구비한 휴대용 단말기는 상기 좌측 채널의 음량 및 상기 우측 채널의 음량이 각각 조절된 상기 소정 화자의 음성을 제공받는 헤드셋 연결부를 더 포함할 수 있다.According to an aspect of the present invention, a portable terminal having a speaker discrimination function according to an aspect of the present invention for storing the volume weight information corresponding to each speaker participating in the multi-party video call is stored. And a control unit for generating a volume control signal based on the volume weight value read out from the storage unit when the voice data is provided from a predetermined speaker and the volume weight corresponding to the predetermined speaker. And an audio codec for adjusting the volume of the left channel and the volume of the right channel through which the voice of the predetermined speaker is output. The volume weights may be set for the left channel and the right channel through which the voice of the predetermined speaker is output. When the voice data is provided from the predetermined speaker, the controller may extract the call separator included in the voice data to identify the predetermined speaker from each speaker participating in the multi-party video call. The audio codec may include a first amplifier for adjusting the volume of the left channel based on the volume control signal and a second amplifier for adjusting the volume of the right channel based on the volume control signal. The portable terminal having a speaker discrimination function during the multi-party video call may further include a headset connection unit for receiving the voice of the predetermined speaker in which the volume of the left channel and the volume of the right channel are respectively adjusted.

또한, 본 발명의 제2 목적을 달성하기 위한 본 발명의 일측면에 따른 다자간 화상 통화시 화자의 영상을 용이하게 구분할 수 있는 다자간 화상 통화시 화자 구분 방법은 소정의 화자로부터 음성 데이터가 제공되면 상기 소정 화자에 상응하는 음량 가중치를 독출하는 단계와, 독출된 상기 음량 가중치에 기초하여 음량 제어 신호를 생성하는 단계 및 생성된 상기 음량 제어 신호에 기초하여 상기 소정 화자의 음성이 출력되는 좌측 채널의 음량 및 우측 채널의 음량을 각각 조절하는 단계를 포함한다. 상기 음량 가중치는 상기 소정 화자의 음성이 출력되는 좌측 채널 및 우측 채널에 대해 각각 설정될 수 있다. 상기 소정의 화자로부터 음성 데이터가 제공되면 상기 소정 화자에 상응하는 음량 가중치를 독출하는 단계는, 상기 소정의 화자로부터 음성 데이터가 제공되면 상기 음성 데이터에 포함된 호 구분자를 추출 하여 상기 다자간 화상 통화에 참여하는 각각의 화자 중에서 상기 소정의 화자를 식별하는 단계를 포함할 수 있다. 상기 다자간 화상 통화시 화자 구분 방법은 상기 좌측 채널의 음량 및 상기 우측 채널의 음량이 각각 조절된 상기 소정 화자의 음성을 헤드셋 연결부에 제공하는 단계를 더 포함할 수 있다.In addition, the method of distinguishing the speaker during the multi-party video call, which can easily distinguish the video of the speaker during the multi-party video call according to an aspect of the present invention for achieving the second object of the present invention, the voice data is provided from a predetermined speaker Reading a volume weight corresponding to a predetermined speaker; generating a volume control signal based on the read volume weight; and generating a volume control signal based on the generated volume control signal. Adjusting the volume of the volume and the right channel, respectively. The volume weights may be set for the left channel and the right channel through which the voice of the predetermined speaker is output. When the voice data is provided from the predetermined speaker, the step of reading a volume weight corresponding to the predetermined speaker may include extracting a call separator included in the voice data when the voice data is provided from the predetermined speaker. And identifying the predetermined speaker among each speaker participating in the. The method of distinguishing a speaker during the multi-party video call may further include providing a voice of the predetermined speaker in which the volume of the left channel and the volume of the right channel are respectively adjusted to the headset connection unit.

상기와 같은 다자간 화상 통화시 화자 구분 기능을 구비한 휴대용 단말기 및 다자간 화상 통화시 화자 구분 방법은 소정의 화자로부터 음성 데이터가 제공되면 상기 음성 데이터로부터 상기 소정 화자를 식별하고, 식별된 화자에 상응하는 음량 가중치를 저장부로부터 독출한다. 그리고 독출된 음량 가중치에 기초하여 상기 소정 화자의 음량을 조절한 후 헤드셋 연결부를 통해 출력한다.The portable terminal having the speaker discrimination function in the multi-party video call and the speaker discrimination method in the multi-party video call identify the predetermined speaker from the voice data when the voice data is provided from a predetermined speaker and correspond to the identified speaker. The volume weight is read from the storage. The volume of the predetermined speaker is adjusted based on the read volume weight and then output through the headset connection unit.

따라서, 헤드셋을 통해 화자의 음성을 청취하는 것만으로 디스플레이부에 표시된 복수의 화상 통화 참여자의 영상 중 화자의 영상을 용이하게 구분할 수 있다.Therefore, the video of the speaker can be easily distinguished from the video of the plurality of video call participants displayed on the display unit only by listening to the voice of the speaker through the headset.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail.

그러나, 이는 본 발명의 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.It should be understood, however, that it is not intended to be limited to the particular embodiments of the invention but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

그리고, 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어 들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.The terms first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The same reference numerals are used for the same constituent elements in the drawings and redundant explanations for the same constituent elements are omitted.

이하, 본 발명의 실시예에서는 음성 및/또는 음향이 두 개의 채널 즉, 좌측 채널 및 우측 채널을 통해 제공되는 스테레오 타입의 헤드셋을 이용하여 다자간 화상통화를 수행하는 것으로 예를 들어 설명한다.Hereinafter, an embodiment of the present invention will be described as an example of performing a multi-party video call using a headset of a stereo type in which voice and / or sound are provided through two channels, that is, a left channel and a right channel.

도 1은 본 발명의 일 실시예에 따른 다자간 화상 통화시 화자 구분 기능을 구비한 휴대용 단말기의 구성을 나타내는 블록도이고, 도 2는 도 1에 도시된 오디오 코덱의 상세한 구성을 나타내는 블록도이다. 또한, 도 3은 본 발명의 일 실시예에 따른 다자간 화상 통화시 디스플레이부에 표시되는 사용자 인터페이스 화면을 나타내고, 도 4는 본 발명의 일 실시예에 따른 다자간 화상 통화시 화자 구분 기능을 위해 설정된 화자별 가중치를 나타낸다.1 is a block diagram showing the configuration of a portable terminal having a speaker discrimination function in a multi-party video call according to an embodiment of the present invention, Figure 2 is a block diagram showing a detailed configuration of the audio codec shown in FIG. 3 illustrates a user interface screen displayed on a display unit during a multi-party video call according to an embodiment of the present invention, and FIG. 4 illustrates a speaker set for a speaker classification function during a multi-party video call according to an embodiment of the present invention. Represents a star weight.

도 1 내지 도 4를 참조하면, 본 발명의 일 실시예에 따른 휴대용 단말기는 카메라부(111), 디스플레이부(113), 그래픽 처리부(115), 비디오 코덱(117), 헤드셋 연결부(120), 마이크(121), 스피커(123), 오디오 코덱(130), 제어부(140), 저장 부(150), 키입력부(160) 및 무선 송수신부(170)를 포함한다.1 to 4, a portable terminal according to an embodiment of the present invention includes a camera unit 111, a display unit 113, a graphic processor 115, a video codec 117, a headset connection unit 120, The microphone 121, the speaker 123, the audio codec 130, the controller 140, the storage 150, a key input unit 160, and a wireless transceiver 170 are included.

카메라부(111) 구성은 공지되어 있으므로 상세하게 도시하지는 않았지만 렌즈, 이미지 센서, 아날로그-디지털 컨버터(analog-digital converter)를 포함하고, 이미지 촬영 또는 화상 통화 기능이 실행되면 렌즈를 통해 입사된 피사체의 광학적 신호를 이미지 센서를 통해 입사된 광학적 신호에 상응하는 전기적 신호로 변환하고, 아날로그 디지털 컨버터를 통해 상기 변환된 전기적 신호를 이에 대응되는 디지털 영상으로 변환한 후 그래픽 처리부(115)에 제공한다.Although the configuration of the camera unit 111 is well-known, it is not shown in detail, but includes a lens, an image sensor, and an analog-digital converter. When the image capturing or video call function is executed, The optical signal is converted into an electrical signal corresponding to the optical signal incident through the image sensor, and the converted electrical signal is converted into a digital image corresponding thereto through an analog-digital converter and then provided to the graphic processor 115.

여기서, 상기 카메라부(111)는 제어부(140)의 제어에 따라 초당 15 프레임 또는 초당 30 프레임의 영상을 촬영할 수 있다.Here, the camera unit 111 may capture an image of 15 frames per second or 30 frames per second under the control of the controller 140.

디스플레이부(113)는 예를 들어 액정표시장치(LCD: Liquid Crystal Display) 또는 유기전계발광장치(OLED: Organic Light Emitting Diodes)와 같은 표시 장치가 될 수 있고, 그래픽 처리부(115)로부터 제공된 영상신호에 기초하여 휴대용 단말기의 메뉴, 동작 상태, 응용프로그램 실행 화면 등과 같은 사용자 인터페이스를 표시한다.The display unit 113 may be, for example, a display device such as a liquid crystal display (LCD) or an organic light emitting diode (OLED), and an image signal provided from the graphic processor 115. Based on the display, the user interface such as a menu, an operation state, an application program execution screen, and the like of the portable terminal are displayed.

특히, 디스플레이부(113)는 다자간 화상 통화시에는 그래픽 처리부(115)로부터 화상 통화 참여자들 및/또는 사용자의 영상을 제공받고, 제공받은 화상 통화 참여자 및/또는 사용자의 영상을 각각 지정된 소정의 표시 영역에 표시한다. 또한, 사진 촬영시에는 미리보기 영상(preview image) 및 촬영된 영상을 표시한다.In particular, the display 113 receives a video call participant and / or a user's video from the graphic processor 115 during a multi-party video call, and displays a predetermined image of the video call participant and / or the user. Mark the area. In addition, when taking a picture, a preview image and a captured image are displayed.

예를 들어, 디스플레이부(113)는 도 4에 도시된 바와 같이 다자간 화상 통화가 시작되면 표시 영역을 화상 통화 참여자의 수에 기초하여 소정 개수의 영역으로 구분한 후 구분된 각각의 영역에 화상 통화 참여자 각각의 영상을 표시한다. 도 4에서는 휴대용 단말기의 사용자를 포함하여 5명(A, B, C, D 및 사용자)이 화상 통화에 참여한 것으로 예를 들어 도시하였다. For example, as shown in FIG. 4, when the multi-party video call is started, the display unit 113 divides the display area into a predetermined number of areas based on the number of video call participants, and then displays the video call in each of the divided areas. Each participant's video is displayed. In FIG. 4, for example, five people (A, B, C, D and users) including a user of a portable terminal participate in a video call.

그래픽 처리부(115)는 카메라부(111)로부터 디지털 영상을 제공받고, 디지털 영상의 컬러 포맷을 변환함으로써 영상의 크기를 축소시킨 후 비디오 코덱(117)에 제공한다. 예를 들어 그래픽 처리부(115)는 카메라부(111)로부터 RGB(Red, Green, Blue) 형태의 로(raw) 데이터를 제공받고 이를 인코딩하여 YCbCr 420 포맷으로 변환할 수 있다.The graphic processor 115 receives the digital image from the camera 111, reduces the size of the image by converting the color format of the digital image, and provides the digital image to the video codec 117. For example, the graphic processor 115 may receive raw data in the form of RGB (Red, Green, Blue) from the camera unit 111, encode the raw data, and convert the raw data into the YCbCr 420 format.

또한, 그래픽 처리부(115)는 비디오 코덱(117)으로부터 디코딩된 영상(예를 들면, 화상 통화 참여자들의 영상)을 제공받고, 제공받은 영상이 디스플레이부(113)의 소정 영역에 표시될 수 있도록 해상도, 표시 위치, 프레임 레이트 등과 같은 그래픽 처리를 수행한 후 디스플레이부(113)에 제공한다.In addition, the graphic processor 115 receives a decoded image (eg, a video call participant's image) from the video codec 117, and resolutions such that the received image can be displayed on a predetermined region of the display 113. After performing graphics processing such as a display position, a frame rate, and the like, the display unit 113 is provided to the display unit 113.

비디오 코덱(117) 그래픽 처리부(115) 또는 제어부(140)로부터 디지털 영상을 제공받고 이를 소정의 화상 통화 포맷으로 인코딩한 후 제어부(140)에 제공한다. 또한, 비디오 코덱(117)은 제어부(140)로부터 제공된 소정의 영상(예를 들면, 화상 통화 참여자들의 영상)을 제공받고 이를 디코딩한 후 그래픽 처리부(115)에 제공한다.The video codec 117 receives a digital image from the graphic processor 115 or the controller 140, encodes the digital image into a predetermined video call format, and provides the digital image to the controller 140. In addition, the video codec 117 receives a predetermined image (for example, a video call participant's image) provided from the controller 140, decodes the same, and provides the decoded image to the graphic processor 115.

비디오 코덱(117)은 예를 들어 H.261, H.263, H.264 및 MPEG-4 등과 같은 코덱을 이용하여 영상을 인코딩 및 디코딩할 수 있고, 화상 통화시에는 H.263, MPEG-4 simple profile level 0를 이용하여 영상을 인코딩 및 디코딩할 수 있다.The video codec 117 can encode and decode video using, for example, codecs such as H.261, H.263, H.264, and MPEG-4, and H.263, MPEG-4 during a video call. Simple profile level 0 can be used to encode and decode video.

헤드셋 연결부(120)는 마이크 입력 단자(미도시), 좌측 음향 출력 단자(127) 및 우측 음향 출력 단자(129)를 포함할 수 있고, 헤드셋(미도시)과 연결된다. The headset connection unit 120 may include a microphone input terminal (not shown), a left sound output terminal 127, and a right sound output terminal 129, and are connected to a headset (not shown).

헤드셋 연결부(120)는 오디오 코덱으로부터 제공된 화상 통화 참여자들의 음성을 좌측 음향 출력 단자(127) 및 우측 음향 출력 단자(129)를 통해 헤드셋으로 출력한다. 또한, 헤드셋 연결부(120)는 헤드셋에 설치된 마이크를 통해 입력된 사용자의 음성을 마이크 입력 단자(미도시)를 통해 수신하여 오디오 코덱으로 제공할 수 있다.The headset connection unit 120 outputs the voices of the video call participants provided from the audio codec to the headset through the left sound output terminal 127 and the right sound output terminal 129. In addition, the headset connection unit 120 may receive a user's voice input through a microphone installed in the headset through a microphone input terminal (not shown) and provide the audio codec.

마이크(121)는 화상 통화 및 음성 통화시 사용자의 음성을 입력받고 입력된 사용자의 음성을 이에 상응하는 전기신호로 변환하여 오디오 코덱(130)에 제공한다The microphone 121 receives a user's voice in a video call and a voice call, converts the input user's voice into an electric signal corresponding thereto, and provides the same to an audio codec 130.

스피커(123)는 오디오 코덱(130)으로부터 디코딩 및 아날로그 신호로 변환되고 소정의 레벨로 증폭된 음성 및/또는 음향 신호를 제공받고 이를 가청주파수 대역의 오디오 신호로 출력한다.The speaker 123 receives the audio and / or sound signal decoded and converted into an analog signal from the audio codec 130 and amplified to a predetermined level, and outputs the audio and / or audio signal in the audio frequency band.

오디오 코덱(130) 마이크(121)로부터 사용자의 음성에 상응하는 아날로그 형태의 전기신호를 제공받고 이에 상응하는 디지털 신호로 변환한 후 변환된 디지털 신호를 음성 통화 또는 영상 통화 전송 규격에 적합하도록 인코딩한 후 제어부(140)에 제공한다.The audio codec 130 receives an analog signal corresponding to the user's voice from the microphone 121 and converts it into a corresponding digital signal, and then encodes the converted digital signal to comply with a voice call or video call transmission standard. It is provided to the control unit 140 after.

또한, 오디오 코덱(130)은 제어부(140)로부터 소정의 음성 및/또는 음향 신호 및 음량 제어 신호를 제공받고 이에 기초하여 상기 소정의 음성 및/또는 음향 신호를 디코딩한 후 아날로그 신호로 변환한다. 이후, 오디오 코덱(130)은 상기 아날로그 신호로 변환된 소정의 음성 및/또는 음향 신호가 출력되는 좌측 채널 및 우 측 채널의 음량을 상기 음량 제어 신호에 기초하여 각각 조절한 후 헤드셋 연결부(120) 및/또는 스피커(123)에 제공한다.In addition, the audio codec 130 receives a predetermined voice and / or sound signal and a volume control signal from the control unit 140, and decodes the predetermined voice and / or sound signal based on the converted audio signal. Subsequently, the audio codec 130 adjusts the volume of the left channel and the right channel, from which the predetermined voice and / or sound signal converted into the analog signal, based on the volume control signal, respectively, and then connects the headset. And / or to the speaker 123.

상기와 같은 기능을 수행하기 위해 오디오 코덱(130)은 도 2에 도시된 바와 같이 보코더(vocoder)(131), 제1 디지털 아날로그 변환기(DAC: Digital to Analog Converter)(133), 제2 디지털 아날로그 변환기(135), 제1 증폭부(137) 및 제2 증폭부(139)를 포함할 수 있다. 여기서 상기 제1 디지털 아날로그 변환기(137) 및 제2 디지털 아날로그 변환기(139)는 하나의 디지털 아날로그 변환기(136)로 구성될 수도 있다.In order to perform the above functions, the audio codec 130 may include a vocoder 131, a first digital to analog converter (DAC) 133, and a second digital analog as shown in FIG. 2. The converter 135 may include a first amplifier 137 and a second amplifier 139. The first digital analog converter 137 and the second digital analog converter 139 may be configured as one digital analog converter 136.

보코더(131)는 제어부(140)로부터 채널 디코딩된 화상 통화 참여자들의 음성 데이터를 제공받고 제공받은 음성 데이터를 음성 신호로 복원한 후 복원된 음성 신호를 제1 디지털 아날로그 변환기(133) 및 제2 디지털 아날로그 변환기에 제공한다.The vocoder 131 receives the voice data of the video call participants who are channel decoded from the controller 140, restores the received voice data to the voice signal, and then restores the restored voice signal to the first digital to analog converter 133 and the second digital signal. Provide to analog converter.

제1 디지털 아날로그 변환기(133) 및 제2 디지털 아날로그 변환기(135)는 각각 보코더(131)로부터 제공된 디지털 형태의 음성 신호를 아날로그 음성 신호로 변환한 후 제1 증폭부(137) 및 제2 증폭부(139)에 제공한다.The first digital to analog converter 133 and the second digital to analog converter 135 convert an audio signal of the digital form provided from the vocoder 131 into an analog voice signal, respectively, and then the first amplifier 137 and the second amplifier. Provided at 139.

제1 증폭부(137)는 제어부(140)의 음량 제어 신호에 기초하여 제1 크기의 음량으로 음성 신호를 증폭한 후 좌측 채널 즉, 헤드셋 연결부(120)의 좌측 음향 출력 단자(127)에 증폭된 음성 신호를 제공한다.The first amplifier 137 amplifies the voice signal at a volume of a first magnitude based on the volume control signal of the controller 140 and then amplifies the left channel, that is, the left sound output terminal 127 of the headset connection unit 120. The voice signal.

제2 증폭부(139)는 제어부(140)의 음량 제어 신호에 기초하여 제2 크기의 음량으로 음성 신호를 증폭한 후 우측 채널 즉, 헤드셋 연결부(120)의 우측 음향 출 력 단자(129)에 증폭된 음성 신호를 제공한다.The second amplifying unit 139 amplifies the voice signal at a volume of a second size based on the volume control signal of the controller 140, and then, the second amplifying unit 139 is connected to the right audio output terminal 129 of the headset connection unit 120. Provides an amplified speech signal.

오디오 코덱(135)은 예를 들어, QCELP(QualComm Code Excited Linear Predictive Coding), EVRC(Enhanced Variable Rate Codec), G.711, G.723, G.723.1, G.728 등과 같은 코덱을 사용하여 음성의 인코딩 및 디코딩을 수행할 수 있다.The audio codec 135 is voiced using a codec such as, for example, QualComm Code Excited Linear Predictive Coding (QCELP), Enhanced Variable Rate Codec (EVRC), G.711, G.723, G.723.1, G.728, and the like. Encoding and decoding may be performed.

제어부(140)는 휴대용 단말기의 고유 기능인 음성 통화 및 화상 통화를 위한 제어 및 처리를 수행한다. 예를 들어, 제어부(140)는 비디오 코덱(117) 및 오디오 코덱(130)으로부터 제공된 영상 신호 및 음성 신호를 다중화 하고 다중화된 신호를 채널 코딩하여 베이스밴드 신호를 생성한 후 무선 송수신부(170)에 제공한다. 또한, 제어부(140)는 무선 송수신부(170)로부터 제공된 베이스밴드 신호를 채널 디코딩하고 역다중화한 후 역다중화된 영상 신호 및 음성 신호를 각각 비디오 코덱(117) 및 오디오 코덱(130)에 제공할 수 있다.The controller 140 performs control and processing for a voice call and a video call, which are inherent functions of the portable terminal. For example, the controller 140 multiplexes the video and audio signals provided from the video codec 117 and the audio codec 130 and generates a baseband signal by channel coding the multiplexed signal. To provide. In addition, the controller 140 may decode and demultiplex the baseband signal provided from the wireless transceiver 170 and provide the demultiplexed video and audio signals to the video codec 117 and the audio codec 130, respectively. Can be.

제어부(140)는 키입력부(160)로부터 다자간 화상 통화를 지시하는 이벤트 신호가 제공되면, 다자간 화상 통화에 참여하는 각 참여자의 휴대용 단말기와 다자간 화상 통화 호를 설정한다.When an event signal indicating a multi-party video call is provided from the key input unit 160, the controller 140 sets up a multi-party video call call with a portable terminal of each participant participating in the multi-party video call.

이후, 다자간 화상 통화 참여자 중 소정의 참여자가 음성 신호를 제공하는 경우에 제어부(140)는 상기 화자의 휴대용 단말기로부터 제공된 음성 데이터에서 호 구분자를 추출함으로써 복수의 화상 통화 참여자 중 음성 신호를 제공한 상기 화자를 식별한다.Subsequently, when a predetermined participant of the multi-party video call participant provides a voice signal, the controller 140 extracts a call separator from the voice data provided from the portable terminal of the speaker to provide the voice signal among the plurality of video call participants. Identifies the speaker.

그리고, 제어부(140)는 상기 음성 신호를 제공한 화자에 상응하는 음량 가중 치를 저장부(150)에서 독출하고 독출된 음량 가중치에 기초하여 오디오 코덱(130)에 소정의 음량 제어 신호를 제공함으로써 디스플레이부(113)에 표시된 다자간 화상 통화 참여자의 영상 중 상기 소정의 화자에 대한 영상이 구분될 수 있도록 한다.In addition, the controller 140 reads the volume weight value corresponding to the speaker providing the voice signal from the storage 150 and provides a predetermined volume control signal to the audio codec 130 based on the read volume weight. The video of the predetermined speaker may be distinguished from the video of the multi-party video call participant displayed in the unit 113.

상기 음량 제어 신호는 구체적으로 오디오 코덱(130)의 제1 증폭부(137) 및 제2 증폭부(139)에 제공될 수 있다.The volume control signal may be specifically provided to the first amplifier 137 and the second amplifier 139 of the audio codec 130.

도 3에 도시된 바와 같이 상기 음량 가중치는 화상 통화에 참여하는 각 화자별로 좌측 음향 출력 단자(127) 및 우측 음향 출력 단자(129)에 각각 출력되는 음량의 크기를 다르게 설정하여 헤드셋 연결부(120)를 통해 제공되는 각 화자의 음성에 대한 공간감을 다르게 함으로써 헤드셋을 통해 각 화자의 음성을 듣는 것 만으로 화자를 구분할 수 있도록 설정된다.As shown in FIG. 3, the volume weight is set by differently setting the volume of the volume output to the left sound output terminal 127 and the right sound output terminal 129 for each speaker participating in the video call. By varying the sense of space for each speaker's voice provided through the speaker, the speaker can be distinguished only by listening to each speaker's voice.

도 3에서는 다자간 화상 통화에 참여하는 참여자가 휴대용 단말기의 사용자를 제외한 4명(A, B, C 및 D)으로 예를 들어 도시하였고, 각 화자별로 헤드셋 연결부(120)의 좌측 음향 출력 단자(127) 및 우측 음향 출력 단자(129)에 출력되는 음량의 가중치를 백분율로 예를 들어 표시하였다.In FIG. 3, for example, four participants (A, B, C, and D) who participated in the multi-party video call except for the user of the portable terminal are shown, and the left sound output terminal 127 of the headset connection unit 120 for each speaker is illustrated. ) And the weight of the volume output to the right sound output terminal 129 as a percentage.

예를 들어, 도 3에서 화자(A)는 좌측 음향 출력 단자(127)를 통해 음성이 최대의 음량(즉, 100 퍼센트)으로 출력되고, 우측 음향 출력 단자(129)를 통해 음성이 최대 음량의 25 퍼센트만 출력되도록 설정되어, 다자간 화상 통화 중 화자(A)가 말을 하는 경우에는 상기 설정된 음량 가중치에 상응하는 음량으로 화자의 음성이 조절되어 헤드셋 연결부(120)로 출력된다.For example, in FIG. 3, the speaker A outputs the voice at the maximum volume (ie, 100 percent) through the left audio output terminal 127, and the voice is output at the maximum volume through the right audio output terminal 129. Since only 25 percent is set to be output, when the speaker A speaks during the multi-party video call, the speaker's voice is adjusted to a volume corresponding to the set volume weight and is output to the headset connection unit 120.

따라서, 사용자가 휴대용 단말기에 구비된 헤드셋 연결부(120)에 헤드셋을 연결하여 다자간 화상 통화를 수행할 경우, 사용자는 좌측 이어폰 및 우측 이어폰에서 들리는 음량의 차이에 의해 도 4에 도시된 바와 같이 휴대용 단말기의 디스플레이부(113)에 표시된 화상 통화 참여자의 영상 중 화자(A)의 영상을 용이하게 구분할 수 있게 된다.Therefore, when a user performs a multi-party video call by connecting a headset to the headset connection unit 120 provided in the portable terminal, the user may change the portable terminal as illustrated in FIG. 4 due to the difference in the volume heard from the left earphone and the right earphone. The video of the participant A among the video call participants' video displayed on the display 113 can be easily distinguished.

도 3 및 도 4에서는 다자간 화상 통화의 참여자가 4명(사용자 제외)인 것으로 예를 들어 설명하였으나, 다자간 화상 통화의 참여자가 4명 이상인 경우에도 각 참여자 별로 서로 다른 음량 가중치를 가지도록 설정될 수 있음은 물론이다.In FIGS. 3 and 4, four participants (excluding users) of the multi-party video call are described as examples. However, even when there are four or more participants in the multi-party video call, each participant may be set to have a different volume weight for each participant. Of course.

저장부(150)는 플래쉬(Flash) 메모리, EEPROM(Electrically Erasable And Programmable Read Only Memory) 등과 같은 비휘발성 메모리로 구성될 수 있고, 휴대용 단말기의 기본 동작에 필요한 시스템 프로그램(예를 들면 운영체제) 및/또는 기타 응용프로그램을 저장한다.The storage unit 150 may be configured of a nonvolatile memory such as a flash memory, an electrically erasable and programmable read only memory (EEPROM), a system program (for example, an operating system) required for basic operation of the portable terminal, and / Or save other applications.

또한, 저장부(150)에는 사용자에 의해 생성된 데이터가 저장될 수 있고, 상기 시스템 프로그램 및/또는 응용프로그램의 수행 중 발생되는 데이터가 저장될 수 있다.In addition, the storage 150 may store data generated by a user, and data generated during the execution of the system program and / or the application program.

특히, 저장부(150)에는 다자간 화상 통화시 각각의 화자를 구분하기 위한 음량 가중치 정보가 저장될 수 있다.In particular, the storage 150 may store volume weight information for distinguishing each speaker during a multi-party video call.

키입력부(160)는 복수의 숫자, 문자 입력 키 및 특수 기능을 수행하기 위한 기능 키를 포함하고, 사용자에 의해 키조작(예를 들면, 다자간 화상 통화 실행)이 발생하면 이에 상응하는 키입력 신호를 제어부(130)에 제공한다.The key input unit 160 includes a plurality of numbers, character input keys, and function keys for performing special functions, and corresponding key input signals when a key operation (for example, executing a multi-party video call) occurs by a user. It provides to the control unit 130.

무선 송수신부(170)는 안테나(171)를 통하여 수신된 무선 고주파(RF: Radio Frequency) 신호를 베이스 밴드(baseband) 신호로 변환하여 제어부(140)에 제공하고, 제어부(140)로부터 제공되는 베이스 밴드 신호를 무선 고주파 신호로 변환하여 안테나(171)를 통해 출력한다.The wireless transceiver 170 converts a radio frequency (RF) signal received through the antenna 171 into a baseband signal and provides it to the controller 140, and the base provided from the controller 140. The band signal is converted into a radio frequency signal and output through the antenna 171.

특히, 무선 송수신부(170)는 비디오 코덱(117) 및 오디오 코덱(130)에 의해 소정의 화상 통화 포맷으로 인코딩된 초당 소정 프레임의 화상 통화 신호를 다자간 화상 통화 참여자들의 휴대용 단말기에 송신하고, 다자간 화상 통화 참여자들의 휴대용 단말기에서 전송한 화상 통화 신호를 수신하여 제어부(140)에 제공한다.In particular, the wireless transceiver 170 transmits a video call signal of a predetermined frame per second encoded by the video codec 117 and the audio codec 130 into a predetermined video call format to a portable terminal of the multi-party video call participants. The video call signal transmitted from the mobile terminal of the video call participants is received and provided to the controller 140.

도 1 내지 도 4에 도시된 본 발명의 일 실시예에 따른 휴대용 단말기에서는 헤드셋 연결부(120)에 포함된 좌측 음향 출력 단자(127) 및 우측 음향 출력 단자(129)를 통해 출력되는 음량을 각각 제어함으로써 소정 화자를 구분하는 것으로 예를 들어 설명하였으나, 본 발명의 다른 실시예에서는 좌측 채널 및 우측 채널에 각각 연결된 두 개의 스피커를 구비하고, 각 스피커를 통해 출력되는 음량을 조절함으로 화자를 구분하도록 구성할 수도 있다.1 to 4, the portable terminal according to the exemplary embodiment of the present invention controls the volume output through the left sound output terminal 127 and the right sound output terminal 129 included in the headset connection unit 120, respectively. For example, the present invention has been described as distinguishing a predetermined speaker, but in another embodiment of the present invention, two speakers connected to the left channel and the right channel are respectively provided, and configured to distinguish the speaker by adjusting the volume output through each speaker. You may.

도 5는 본 발명의 일 실시예에 따른 다자간 화상 통화시 화자 구분 과정을 나타내는 흐름도이다.5 is a flowchart illustrating a speaker classification process in a multi-party video call according to an embodiment of the present invention.

도 5를 참조하면, 먼저 제어부(140)는 키입력부(160)로부터 다자간 화상 통화를 지시하는 이벤트 신호가 제공되면 다자간 화상 통화에 참여하는 참여자의 휴대용 단말기 사이에 다자간 화상 통화 호를 연결한다(단계 201).Referring to FIG. 5, first, when the event signal indicating the multi-party video call is provided from the key input unit 160, the control unit 140 connects the multi-party video call call between the mobile terminals of the participants participating in the multi-party video call (step S160). 201).

이후, 다자간 화상 통화 참여자 중 소정의 화자로부터 음성이 사용자의 휴대 용 단말기로 제공되면(단계 203), 제어부(140)는 상기 소정의 화자로부터 제공된 음성 데이터에서 호 구분자를 추출함으로써 복수의 화상 통화 참여자 중 음성을 제공한 상기 화자를 식별한다(단계 205).Thereafter, when a voice is provided to the user's portable terminal from a predetermined speaker among the multi-party video call participants (step 203), the controller 140 extracts a call separator from the voice data provided from the predetermined speaker, thereby providing a plurality of video call participants. The speaker who provided the voice is identified (step 205).

이후, 제어부(140)는 상기 음성 신호를 제공한 화자에 상응하는 음량 가중치를 저장부(150)에서 독출하고(단계 207), 독출된 음량 가중치에 기초하여 상기 화자의 음량을 조절한다(단계 209). 여기서, 제어부(140)는 상기 독출된 음량 가중치에 상응하는 음량 제어 신호를 오디오 코덱(130)에 제공함으로써 상기 화자의 음량을 조절할 수 있다.Thereafter, the controller 140 reads out the volume weight corresponding to the speaker who provided the voice signal from the storage unit 150 (step 207), and adjusts the volume of the speaker based on the read volume weight (step 209). ). Here, the controller 140 may adjust the volume of the speaker by providing a volume control signal corresponding to the read volume weight to the audio codec 130.

오디오 코덱(130)은 제어부(140)로부터 제공된 음량 제어 신호에 기초하여 상기 화자의 음량을 조절한 후 음량이 조절된 상기 화자의 음성을 헤드셋 연결부(120)에 제공한다(단계 211).The audio codec 130 adjusts the volume of the speaker based on the volume control signal provided from the controller 140 and provides the headset connection unit 120 with the speaker whose volume is adjusted (step 211).

이상 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It will be possible.

도 1은 본 발명의 일 실시예에 따른 다자간 화상 통화시 화자 구분 기능을 구비한 휴대용 단말기의 구성을 나타내는 블록도이다.1 is a block diagram showing the configuration of a portable terminal having a speaker discrimination function in a multi-party video call according to an embodiment of the present invention.

도 2는 도 1에 도시된 오디오 코덱의 상세한 구성을 나타내는 블록도이다.FIG. 2 is a block diagram illustrating a detailed configuration of the audio codec shown in FIG. 1.

도 3은 본 발명의 일 실시예에 따른 다자간 화상 통화시 디스플레이부에 표시되는 사용자 인터페이스 화면을 나타낸다.3 illustrates a user interface screen displayed on a display unit in a multi-party video call according to an embodiment of the present invention.

도 4는 본 발명의 일 실시예에 따른 다자간 화상 통화시 화자 구분 기능을 위해 설정된 화자별 음량 가중치를 나타낸다.4 illustrates volume weights for each speaker set for a speaker classification function in a multi-party video call according to an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

113 : 디스플레이부 117 : 비디오 코덱113: display unit 117: video codec

120 : 헤드셋 연결부 130 : 오디오 코덱120: headset connection 130: audio codec

131 : 보코더 140 : 제어부131: vocoder 140: control unit

150 : 저장부 170 : 무선 송수신부150: storage unit 170: wireless transceiver

Claims

In a portable terminal having a multi-party video call function,

A storage unit for storing volume weight information corresponding to each speaker participating in the multi-party video call;

A controller configured to generate a volume control signal based on the read volume weight after reading a volume weight corresponding to the predetermined speaker from the storage unit when voice data is provided from a predetermined speaker; And

And a audio codec for adjusting the volume of the left channel and the volume of the right channel, respectively, on which the voice of the predetermined speaker is output based on the volume control signal.

The method of claim 1, wherein the volume weight is

And a speaker discrimination function for a multi-way video call, wherein the voice of the predetermined speaker is set for the left channel and the right channel, respectively.

The method of claim 1, wherein the control unit

When the voice data is provided from the predetermined speaker, the call separator included in the voice data is extracted to identify the predetermined speaker from each speaker participating in the multi-party video call. Portable terminal having a.

The method of claim 1, wherein the audio codec

A first amplifier adjusting a volume of the left channel based on the volume control signal; And

And a second amplifier configured to adjust a volume of the right channel based on the volume control signal.

The portable terminal of claim 1, wherein the portable terminal has a speaker discrimination function in the multi-party video call.

And a headset connection unit configured to receive a voice of the predetermined speaker whose volume of the left channel and the volume of the right channel are respectively adjusted.

Reading volume weights corresponding to the predetermined speaker when voice data is provided from a predetermined speaker;

Generating a volume control signal based on the read volume weight; And

And adjusting the volume of the left channel and the volume of the right channel, respectively, on which the voice of the predetermined speaker is output based on the generated volume control signal.

The method of claim 6, wherein the volume weight is

And a method for distinguishing a speaker during a multi-way video call, wherein the voice of the predetermined speaker is set for the left channel and the right channel, respectively.

The method of claim 6, wherein when the voice data is provided from the predetermined speaker, reading a volume weight corresponding to the predetermined speaker comprises:

And when the voice data is provided from the predetermined speaker, extracting a call separator included in the voice data to identify the predetermined speaker among each speaker participating in the multi-party video call. How to distinguish poetry speakers.

The method of claim 6, wherein the method of identifying a speaker in the multi-party video call is

And providing a voice of the predetermined speaker whose volume of the left channel and the volume of the right channel are respectively adjusted to a headset connection unit.