KR100861284B1

KR100861284B1 - Method for processing of voice during image communication and portable terminal having the same

Info

Publication number: KR100861284B1
Application number: KR1020070096389A
Authority: KR
Inventors: 김정호
Original assignee: (주)케이티에프테크놀로지스
Priority date: 2007-09-21
Filing date: 2007-09-21
Publication date: 2008-10-01

Abstract

A voice processing method during a video call and a portable terminal having a voice processing function during the video call are provided to prevent distortion or deletion of a voice caused by interference between a sender's voice and a receiver's voice during the video call, thereby offering a high-quality voice call. A controller(130) transmits a sender's temporarily stored voice to a receiver after voice outputting of the receiver stops and the sender's voice is temporarily stored, when the sender's voice is inputted while the receiver's voice is outputted, and outputs a temporarily stored voice of the receiver after the sender's voice inputting stops and the received voice of the receiver is temporarily stored, when the receiver's voice is received while the sender's voice is inputted. A storage unit(140) temporarily stores at least one of the sender's voice and the receiver's voice. An audio codec(125) receives a voice or sound signal from the controller, converts the signal into an analog signal after decoding the signal, and then provides the converted analog signal to a speaker(123).

Description

TECHNICAL FOR PROCESSING OF VOICE DURING IMAGE COMMUNICATION AND PORTABLE TERMINAL HAVING THE SAME}

본 발명은 휴대용 단말기에 관한 것으로, 더욱 상세하게는 영상 통화 기능을 구비한 휴대용 단말기에 적용될 수 있는 영상 통화 중 음성 처리 방법 및 영상 통화 중 음성 처리 기능을 가지는 휴대용 단말기에 관한 것이다.The present invention relates to a portable terminal, and more particularly, to a portable terminal having a video processing voice processing method and a video processing voice processing function that can be applied to a portable terminal having a video call function.

이동 통신 기술의 세대 진화로 인해 본격적인 영상 통화가 상용화되면서 기존의 음성 통화 및 단문 메시지 송수신 위주의 이동통신 서비스에서 벗어나 장소에 관계없이 휴대용 단말기를 이용하여 상대방의 모습을 보면서 통화할 수 있게 되었다.Due to the evolution of mobile communication technology, full-scale video telephony has become commercially available, and it is now possible to make a call while looking at the other party using a portable terminal, regardless of the place, away from the existing mobile communication service mainly focused on transmitting and receiving voice calls and short messages.

영상 통화는 휴대용 단말기에 구비된 카메라를 이용하여 발신자의 모습을 촬영하고 촬영된 영상을 영상통화의 전송 규격에 적합하도록 처리한 뒤 이를 영상 통화 상대방에게 전송하고, 영상 통화 수신자로부터 전송된 영상 신호를 휴대용 단말기의 디스플레이 장치에 표시될 수 있도록 변환하여 발신자의 휴대용 단말기에 표시함으로써 영상 통화 상대방이 서로 상대방의 모습을 보면서 통화하도록 하는 서 비스이다.In the video call, the camera of the mobile terminal is used to take a picture of the caller, process the recorded video to meet the transmission standard of the video call, and transmit the video call to the video call counterpart. It is a service that allows video call counterparts to talk while looking at each other by converting them to be displayed on the display device of the portable terminal and displaying them on the portable terminal of the caller.

영상 통화는 사용자의 영상을 촬영하여야 하기 때문에 영상 통화시 사용자는 휴대용 단말기와 소정 거리 이상 떨어져서 통화해야 하고, 이어폰 및 마이크가 구비된 헤드셋을 사용하지 않는 경우에는 휴대용 단말기와 소정 거리 이상 떨어진 위치에서 휴대용 단말기에 구비된 스피커 및 마이크를 통해 영상 통화 상대방과 음성을 주고 받게 된다.Since video call requires user's video to be recorded, the user should call at least a certain distance away from the portable terminal.If not using a headset equipped with earphones and microphone, the user should be away from the portable terminal. The speaker and the microphone provided in the terminal exchange voice with the video call counterpart.

상기와 같은 이유로 일반적인 영상 통화 기능을 구비한 휴대용 단말기는 영상 통화시에는 음성 통화시보다 스피커의 출력을 증가시킨다.For the same reason, a portable terminal having a general video call function increases the output of a speaker during a video call than when using a voice call.

일반적인 휴대용 단말기는 크기가 소형이기 때문에 스피커와 마이크간의 거리가 가깝고 상기와 같이 영상 통화시 스피커의 출력을 높이게 되면 스피커에서 출력된 음성이 마이크로 입력되는 에코(echo) 현상이 발생될 수 있다. 상기와 같은 에코 현상을 방지하기 위해 일반적인 휴대용 단말기에는 에코 소거기(echo canceller)가 구비되어 있으나, 에코 소거기로 인해 영상 통화시 송화자와 수화자가 동시에 말을 하는 경우에는 음성이 왜곡되거나 삭제되어 영상 통화 중 음성이 단절되고 이로 인해 영상 통화 상대방에게 음성이 정상적으로 전달되지 못하는 단점이 있다.Since a typical portable terminal has a small size, the distance between the speaker and the microphone is close, and if the output of the speaker is increased during the video call as described above, an echo phenomenon may occur in which the voice output from the speaker is input into the microphone. In order to prevent such an echo phenomenon, an echo canceller is provided in a general portable terminal. However, when the caller and the receiver speak simultaneously during a video call due to the echo canceller, the voice is distorted or deleted. The voice is disconnected during the call, which causes a disadvantage that the voice is not normally delivered to the video call counterpart.

따라서, 본 발명의 제1 목적은 영상 통화시 영상 통화 당사자간에 양호한 음성 통화를 제공할 수 있는 영상 통화 중 음성 처리 방법을 제공하는 것이다.Accordingly, a first object of the present invention is to provide a voice processing method during a video call which can provide a good voice call between video call parties during a video call.

또한, 본 발명의 제2 목적은 영상 통화시 영상 통화 당사자간에 양호한 음성 통화를 제공할 수 있는 영상 통화 중 음성 처리 기능을 가지는 휴대용 단말기를 제공하는 것이다.In addition, a second object of the present invention is to provide a portable terminal having a voice processing function during a video call that can provide a good voice call between video call parties during a video call.

상술한 본 발명의 제1 목적을 달성하기 위한 본 발명의 일측면에 따른 영상 통화 중 음성 처리 방법은, 수화자의 음성이 출력되고 있는 도중에 송화자의 음성이 입력되면 상기 입력되는 송화자의 음성을 임시 저장하는 단계 및 상기 수화자의 음성 출력이 멈춘 후에 상기 임시 저장된 송화자의 음성을 상기 수화자에게 전송하 는 단계를 포함한다. 상기 수화자의 음성이 출력되고 있는 도중에 송화자의 음성이 입력되면 상기 입력되는 송화자의 음성을 임시 저장하는 단계는 디코딩되는 음성 데이터의 존재 여부에 기초하여 상기 수화자의 음성 출력 여부를 판단할 수 있다. 수화자의 음성이 출력되고 있는 도중에 송화자의 음성이 입력되면 상기 입력되는 송화자의 음성을 임시 저장하는 단계는 상기 송화자의 음성을 디지털 데이터로 변환한 후 상기 변환된 디지털 데이터를 소정의 코덱을 이용하여 압축한 데이터가 저장될 수 있다. 상기 수화자의 음성 출력이 멈춘 후에 상기 임시 저장된 송화자의 음성을 상기 수화자에게 전송하는 단계는, 상기 수화자의 무음 구간이 기설정된 소정 시간 이상인 경우에 수화자의 음성 출력이 멈춘 것으로 판단할 수 있다. According to an aspect of the present invention for achieving the first object of the present invention, the voice processing method of the video call, the voice of the caller is temporarily stored if the voice of the caller is input while the voice of the caller is being output. And transmitting the voice of the temporarily stored caller to the called party after the voice output of the called party is stopped. If the voice of the talker is input while the talker's voice is being output, temporarily storing the input talker's voice may determine whether the talker's voice is output based on the presence of the decoded voice data. If the voice of the talker is input while the talker's voice is being output, the step of temporarily storing the voice of the talker is performed by converting the talker's voice into digital data and then compressing the converted digital data using a predetermined codec. One data can be stored. The step of transmitting the temporarily stored voice of the called party to the called party after the output of the voice of the called party is stopped may determine that the output of the called party is stopped when the silent section of the called party is more than a predetermined time.

또한, 상술한 본 발명의 제1 목적을 달성하기 위한 본 발명의 다른 측면에 따른 영상 통화 중 음성 처리 방법은, 송화자의 음성이 입력되고 있는 도중에 수화자의 음성이 수신되면 상기 수신된 수화자의 음성을 임시 저장하는 단계 및 상기 송화자의 음성 입력이 멈춘 후에 상기 임시 저장된 수화자의 음성을 출력하는 단계를 포함한다. 상기 송화자의 음성이 입력되고 있는 도중에 수화자의 음성이 수신되면 상기 수신된 수화자의 음성을 임시 저장하는 단계는, 소정 레벨 이상의 신호 입력 검출 및 음성에 해당하는 주파수 대역의 신호 검출 중 어느 하나의 방법을 이용하여 상기 송화자의 음성 입력 여부를 판단하는 단계를 포함할 수 있다. 상기 송화자의 음성이 입력되고 있는 도중에 수화자의 음성이 수신되면 상기 수신된 수화자의 음성을 임시 저장하는 단계는, 상기 수화자의 음성을 디코딩한 후 디코딩 된 수화자의 음성을 임시 저장할 수 있다. 상기 송화자의 음성 입력이 멈춘 후에 상기 임시 저장된 수화자의 음성을 출력하는 단계는, 상기 송화자의 음성의 무음 구간이 기설정된 소정 시간 이상인 경우에 상기 송화자의 음성 입력이 멈춘 것으로 판단할 수 있다.In addition, according to another aspect of the present invention for achieving the first object of the present invention, the voice processing method of the video call, if the voice of the caller is received while the voice of the caller is input, And temporarily storing the voice of the temporarily stored caller after the voice input of the caller is stopped. When the voice of the called party is received while the voice of the talker is being input, temporarily storing the received voice of the called party may include any one of a method of detecting a signal input of a predetermined level or more and a signal of a frequency band corresponding to the voice. The method may include determining whether the caller inputs a voice. If the voice of the called party is received while the voice of the talker is being input, temporarily storing the received voice of the called party may decode the voice of the called party and then temporarily store the decoded voice of the called party. The step of outputting the temporarily stored voice of the called party after the voice input of the caller is stopped may be determined that the voice input of the caller is stopped when the silent section of the caller's voice is longer than a predetermined time.

또한, 본 발명의 제2 목적을 달성하기 위한 본 발명의 일측면에 따른 영상 통화 중 음성 처리 기능을 가지는 휴대용 단말기는, 수화자의 음성이 출력되고 있는 도중에 송화자의 음성이 입력되면 상기 송화자의 음성을 임시 저장한 후 상기 수화자의 음성 출력이 멈춘 후에 상기 임시 저장된 송화자의 음성을 상기 수화자에게 전송하고, 상기 송화자의 음성이 입력되고 있는 도중에 상기 수화자의 음성이 수신되면 상기 수신된 수화자의 음성을 임시 저장한 후 상기 송화자의 음성 입력이 멈춘 후에 상기 임시 저장된 수화자의 음성을 출력하는 제어부 및 상기 송화자의 음성 및 상기 수화자의 음성 중 적어도 하나가 임시 저장되는 버퍼가 구비된 저장부를 포함한다. 상기 제어부는 디코딩되는 음성 데이터의 존재 여부에 기초하여 상기 수화자의 음성 출력 여부를 판단할 수 있다. 상기 제어부는 소정 레벨 이상의 신호 입력 검출 및 음성에 해당하는 주파수 대역의 신호 검출 중 어느 하나의 방법을 이용하여 상기 송화자의 음성 입력 여부를 판단할 수 있다. 상기 영상 통화 중 음성 처리 기능을 가지는 휴대용 단말기는 상기 송화자의 음성을 디지털 데이터로 변환한 후 상기 변환된 디지털 데이터를 소정의 코덱을 이용하여 압축하는 오디오 코덱을 더 포함할 수 있다. 상기 제어부는 상기 수화자의 음성의 무음 구간이 기설정된 소정 시간 이상인 경우에 수화자의 음성 출력이 멈춘 것으로 판단하고, 상기 송화자의 음성의 무음 구간이 상기 기설정된 소정 시간 이상인 경우에 상기 송화자 의 음성 입력이 멈춘 것으로 판단할 수 있다.In addition, the portable terminal having a voice processing function during a video call according to an aspect of the present invention for achieving the second object of the present invention, if the voice of the caller is input while the voice of the caller is being output After temporarily storing the voice of the called party, the voice of the temporarily stored caller is transmitted to the called party, and when the voice of the called party is received while the voice of the called party is being inputted, the received voice of the called party is temporarily stored. And a controller configured to output a voice of the temporarily stored caller after the voice input of the caller is stopped and a buffer for temporarily storing at least one of the voice of the caller and the voice of the caller. The controller may determine whether the receiver outputs the voice based on the presence of the decoded voice data. The controller may determine whether the caller inputs a voice using any one method of detecting a signal input of a predetermined level or more and detecting a signal in a frequency band corresponding to a voice. The portable terminal having a voice processing function during the video call may further include an audio codec for converting the caller's voice into digital data and then compressing the converted digital data using a predetermined codec. The controller determines that the output of the talker's voice is stopped when the silent section of the talker's voice is greater than or equal to a predetermined time, and when the silent section of the talker's speech is more than the predetermined time, the caller's voice input is It can be judged as stopped.

상기와 같은 영상 통화 중 음성 처리 방법 및 영상 통화 중 음성 처리 기능을 가지는 휴대용 단말기에 따르면, 영상 통화 호가 연결된 후 송화자와 수화자가 동시에 말을 하는 경우에 먼저 말한 사람의 음성을 처리하고, 나중에 말한 사람의 음성은 버퍼에 임시 저장한 후 먼저 말한 사람의 음성이 처리된 후 처리함으로써 소정 시간에 송화자 및 수화자 중 어느 한 사람의 음성만 처리되도록 한다.According to the portable terminal having the voice processing method and the voice processing function during the video call as described above, when the caller and the caller speak at the same time after the video call is connected, the voice of the first speaker is processed and the person who speaks later. The voice of is temporarily stored in the buffer, and then the voice of the first speaker is processed and processed so that only the voice of one of the talker and the talker is processed at a predetermined time.

따라서, 영상 통화시 송화자 음성과 수화자 음성 사이에 간섭으로 인한 음성의 왜곡이나 삭제를 방지하여 양호한 음성 통화를 제공할 수 있다.Therefore, it is possible to provide a good voice call by preventing distortion or deletion of the voice due to interference between the caller voice and the caller voice during the video call.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.As the present invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description.

그러나, 이는 본 발명의 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.However, this is not intended to be limited to the specific embodiment of the present invention, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

그리고, 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또 는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 이하, 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, with reference to the accompanying drawings, it will be described in detail a preferred embodiment of the present invention. Hereinafter, the same reference numerals are used for the same components in the drawings, and duplicate descriptions of the same components are omitted.

도 1은 본 발명의 일 실시예에 따른 영상 통화 중 음성 처리 기능을 가지는 휴대용 단말기의 구성을 나타내는 블록도이다.1 is a block diagram showing the configuration of a portable terminal having a voice processing function during a video call according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 휴대용 단말기는 카메라부(111), 디스플레이부(113), 그래픽 처리부(115), 비디오 코덱(117), 마이크(121), 스피커(123), 오디오 코덱(125), 제어부(130), 저장부(140), 키입력부(150) 및 무선 송수신부(160)를 포함한다.Referring to FIG. 1, a portable terminal according to an exemplary embodiment of the present invention includes a camera unit 111, a display unit 113, a graphic processor 115, a video codec 117, a microphone 121, and a speaker 123. , An audio codec 125, a controller 130, a storage 140, a key input unit 150, and a wireless transceiver 160.

카메라부(111)의 구성은 공지되어 있으므로 상세하게 도시하지는 않았지만 렌즈, 이미지 센서, 아날로그-디지털 컨버터(analog-digital converter)를 포함하고, 이미지 촬영 또는 영상 통화 기능이 실행되면 렌즈를 통해 입사된 피사체의 광학적 신호를 이미지 센서를 통해 입사된 광학적 신호에 상응하는 전기적 신호로 변환하고, 아날로그 디지털 컨버터를 통해 상기 변환된 전기적 신호를 이에 대응되는 디지털 영상으로 변환한 후 그래픽 처리부(115)에 제공한다.Although the configuration of the camera unit 111 is well-known, it is not shown in detail, but includes a lens, an image sensor, and an analog-digital converter, and when an image capturing or video call function is executed, a subject incident through the lens The optical signal is converted into an electrical signal corresponding to the optical signal incident through the image sensor, and converted into a digital image corresponding to the converted electrical signal through an analog-to-digital converter and then provided to the graphic processor 115.

여기서, 상기 카메라부(111)는 제어부(130)의 제어에 따라 초당 15 프레임 또는 초당 30 프레임의 영상을 촬영할 수 있다.Here, the camera unit 111 may capture an image of 15 frames per second or 30 frames per second under the control of the controller 130.

디스플레이부(113)는 예를 들어 액정표시장치(LCD: Liquid Crystal Display) 또는 유기전계발광장치(OLED: Organic Light Emitting Diodes)와 같은 표시 장치가 될 수 있고, 그래픽 처리부(115)로부터 제공된 영상신호에 기초하여 휴대용 단말기의 메뉴, 동작 상태, 응용프로그램 실행 화면 등과 같은 사용자 인터페이스를 표시한다.The display unit 113 may be, for example, a display device such as a liquid crystal display (LCD) or an organic light emitting diode (OLED), and an image signal provided from the graphic processor 115. Based on the display, the user interface such as a menu, an operation state, an application program execution screen, and the like of the portable terminal are displayed.

특히, 디스플레이부(113)는 영상 통화시에는 그래픽 처리부(115)로부터 영상 통화 상대방 및/또는 사용자의 영상을 제공받고, 제공받은 영상 통화 상대방 및/또는 사용자의 영상을 소정의 표시영역에 표시한다. 또한, 사진 촬영시에는 미리보기 영상(preview image) 및 촬영된 영상을 표시한다.In particular, during the video call, the display 113 receives the video call counterpart and / or the user's video from the graphic processor 115 and displays the received video call counterpart and / or the user's video in a predetermined display area. . In addition, when taking a picture, a preview image and a captured image are displayed.

그래픽 처리부(115)는 카메라부(111)로부터 디지털 영상을 제공받고, 디지털 영상의 컬러 포맷을 변환함으로써 영상의 크기를 축소시킨 후 비디오 코덱(117)에 제공한다. 예를 들어 그래픽 처리부(115)는 카메라부(111)로부터 RGB(Red, Green, Blue) 형태의 로(raw) 데이터를 제공받고 이를 인코딩하여 YCbCr 420 포맷으로 변환할 수 있다.The graphic processor 115 receives the digital image from the camera 111, reduces the size of the image by converting the color format of the digital image, and provides the digital image to the video codec 117. For example, the graphic processor 115 may receive raw data in the form of RGB (Red, Green, Blue) from the camera unit 111, encode the raw data, and convert the raw data into the YCbCr 420 format.

또한, 그래픽 처리부(115)는 비디오 코덱(117)으로부터 디코딩된 영상(예를 들면, 영상 통화 상대방의 영상)을 제공받고, 제공받은 영상이 디스플레이부(113)의 소정 영역에 표시될 수 있도록 해상도, 표시 위치, 프레임 레이트 등과 같은 그래픽 처리를 수행한 후 디스플레이부(113)에 제공한다.In addition, the graphic processor 115 receives an image (eg, an image of a video call counterpart) decoded from the video codec 117 and resolutions so that the received image can be displayed on a predetermined region of the display 113. After performing graphics processing such as a display position, a frame rate, and the like, the display unit 113 is provided to the display unit 113.

비디오 코덱(117)은 그래픽 처리부(115) 또는 제어부(130)로부터 디지털 영상을 제공받고 이를 소정의 영상 통화 포맷으로 인코딩한 후 제어부(130)에 제공한다. 또한, 비디오 코덱(117)은 제어부(130)로부터 제공된 소정의 영상(예를 들면, 영상 통화 상대방의 영상)을 제공받고 이를 디코딩한 후 그래픽 처리부(115)에 제공한다.The video codec 117 receives a digital image from the graphic processor 115 or the controller 130, encodes the digital image in a predetermined video call format, and provides the digital image to the controller 130. In addition, the video codec 117 receives a predetermined image (for example, an image of a video call counterpart) provided from the controller 130, decodes the same, and provides the decoded image to the graphic processor 115.

비디오 코덱(117)은 예를 들어 H.261, H.263, H.264 및 MPEG-4 등과 같은 코덱을 이용하여 영상을 인코딩 및 디코딩할 수 있고, 영상 통화시에는 H.263, MPEG-4 simple profile level 0를 이용하여 영상을 인코딩 및 디코딩할 수 있다.The video codec 117 can encode and decode video using, for example, codecs such as H.261, H.263, H.264, and MPEG-4, and during video calls, H.263, MPEG-4 Simple profile level 0 can be used to encode and decode video.

마이크(121)는 영상 통화 및 음성 통화시 사용자의 음성을 입력받고 입력된 사용자의 음성을 이에 상응하는 전기신호로 변환하여 오디오 코덱(125)에 제공한다.The microphone 121 receives a user's voice in a video call and a voice call, converts the input user's voice into an electric signal corresponding thereto, and provides the same to an audio codec 125.

스피커(123)는 오디오 코덱(125)으로부터 디코딩 및 아날로그 신호로 변환되고 소정의 레벨로 증폭된 음성 및/또는 음향 신호를 제공받고 이를 가청주파수 대역의 오디오 신호로 출력한다.The speaker 123 receives the audio and / or sound signal decoded from the audio codec 125 and converted into an analog signal and amplified to a predetermined level, and outputs the audio and / or audio signal in the audio frequency band.

오디오 코덱(125)은 마이크(121)로부터 사용자의 음성에 상응하는 아날로그 형태의 전기신호를 제공받고 이에 상응하는 디지털 신호로 변환한 후 변환된 디지털 신호를 음성 통화 또는 영상 통화 전송 규격에 적합하도록 인코딩한 후 제어부(130)에 제공한다.The audio codec 125 receives an electric signal in analog form corresponding to the user's voice from the microphone 121, converts it into a digital signal corresponding thereto, and encodes the converted digital signal to comply with a voice call or video call transmission standard. After that is provided to the control unit 130.

또한, 오디오 코덱(125)은 제어부(130)로부터 소정의 음성 및/또는 음향 신호를 제공받고 이를 디코딩한 후 아날로그 신호로 변환하고 증폭한 후 스피커(123)에 제공한다.In addition, the audio codec 125 receives a predetermined voice and / or sound signal from the controller 130, decodes it, converts the signal into an analog signal, and amplifies the analog signal to the speaker 123.

오디오 코덱(125)은 예를 들어, QCELP(QualComm Code Excited Linear Predictive Coding), EVRC(Enhanced Variable Rate Codec), G.711, G.723, G.723.1, G.728 등과 같은 코덱을 사용하여 음성의 인코딩 및 디코딩을 수행할 수 있다.The audio codec 125 is, for example, voiced using a codec such as QualComm Code Excited Linear Predictive Coding (QCELP), Enhanced Variable Rate Codec (EVRC), G.711, G.723, G.723.1, G.728, or the like. Encoding and decoding may be performed.

제어부(130)는 휴대용 단말기의 고유 기능인 음성 통화 및 영상 통화를 위한 제어 및 처리를 수행한다. 예를 들어, 제어부(130)는 비디오 코덱(117) 및 오디오 코덱(125)으로부터 제공된 영상 신호 및 음성 신호를 다중화 하고 다중화된 신호를 채널 코딩하여 베이스밴드 신호를 생성한 후 무선 송수신부(160)에 제공한다. 또한, 제어부(130)는 무선 송수신부(160)로부터 제공된 베이스밴드 신호를 채널 디코딩하고 역다중화한 후 역다중화된 영상 신호 및 음성 신호를 각각 비디오 코덱(117) 및 오디오 코덱(125)에 제공할 수 있다.The controller 130 performs control and processing for voice call and video call, which are inherent functions of the portable terminal. For example, the controller 130 multiplexes the video and audio signals provided from the video codec 117 and the audio codec 125 and generates a baseband signal by channel coding the multiplexed signal. To provide. In addition, the controller 130 may decode and demultiplex the baseband signal provided from the wireless transceiver 160 and provide the demultiplexed video and audio signals to the video codec 117 and the audio codec 125, respectively. Can be.

또한, 제어부(130)는 키입력부(150)로부터 영상통화를 지시하는 이벤트 신호가 제공되면, 영상 통화 상대방의 휴대용 단말기와 영상 통화 호를 연결한 후 사용자(즉, 송화자)와 영상통화 상대방(즉, 수화자)이 동시에 말을 하는 경우에는 사용자와 영상통화 상대방 중 먼저 말한 사람의 음성을 처리하고 나중에 말한 사람의 음성은 임시 저장한 후 먼저 말한 사람의 음성이 처리된 후 처리함으로써 소정 시간에 송화자 및 수화자 중 어느 한 사람의 음성만 처리되도록 한다.In addition, when an event signal indicating a video call is provided from the key input unit 150, the controller 130 connects the video call call with the portable terminal of the video call counterpart, and then the user (ie, the caller) and the video call counterpart (ie , When the talker speaks at the same time, the voice of the first speaker among the user and the video call counterpart is processed, and the voice of the later speaker is temporarily stored and then processed after the voice of the first speaker is processed. And only the voice of any one of the called parties.

이를 위해 제어부(130)는 음성 통화 제어 모듈(131)을 포함할 수 있다.To this end, the controller 130 may include a voice call control module 131.

구체적으로, 음성 통화 제어 모듈(131)은 영상 통화 호가 연결된 후 마이크(121)를 통해 사용자의 음성이 입력되면 영상 통화 상대방의 음성이 스피커(123)를 통해 출력되고 있는지를 확인하고 영상 통화 상대방의 음성이 스피커(123)를 통해 출력되고 있는 것으로 확인되면 마이크(121)를 통해 입력되는 사용자의 음성을 저장부(140)의 버퍼(141)에 저장(buffering)한 후 스피커(123)를 통해 출력되는 영상 통화 상대방의 음성이 멈춘 다음 버퍼(141)에 저장된 사용자의 음성을 처리하여 영상통화 상대방의 휴대용 단말기로 전송한다.In detail, when the user's voice is input through the microphone 121 after the video call is connected, the voice call control module 131 checks whether the voice of the video call counterpart is output through the speaker 123, If it is determined that the voice is being output through the speaker 123, the user's voice input through the microphone 121 is stored in the buffer 141 of the storage 140 and then output through the speaker 123. After the voice of the video call counterpart is stopped, the voice of the user stored in the buffer 141 is processed and transmitted to the portable terminal of the video call counterpart.

여기서, 사용자의 음성은 영상 통화 상대방의 음성이 멈추는 시점까지 상기 버퍼(141)에 저장된다.Here, the user's voice is stored in the buffer 141 until the time when the voice of the video call counterpart stops.

또한, 음성 통화 제어 모듈(131)은 영상 통화 호가 연결된 후 영상 통화 상대방의 음성이 수신되면, 사용자의 음성이 마이크(121)를 통해 입력되고 있는지를 확인하고, 사용자의 음성이 마이크(121)를 통해 입력되고 있는 것으로 확인되면, 수신된 영상 통화 상대방의 음성을 저장부(140)의 버퍼(141)에 저장한 후 마이크(121)를 통해 입력되는 사용자의 음성이 멈춘 후에 버퍼(141)에 저장된 영상 통화 상대방의 음성을 스피커(123)를 통해 출력할 수 있다.In addition, when the voice call control module 131 receives the video call counterpart's voice after the video call is connected, the voice call control module 131 checks whether the user's voice is input through the microphone 121, and the user's voice controls the microphone 121. If it is confirmed that the input through the video call counterpart of the received in the buffer 141 of the storage unit 140, after the voice of the user input through the microphone 121 is stopped and stored in the buffer 141 The voice of the video call counterpart may be output through the speaker 123.

여기서, 음성 통화 제어 모듈(131)은 마이크(121)를 통해 사용자의 음성이 입력되고, 영상 통화 상대방의 음성이 수신되는 것을 오디오 코덱(125)에서 처리되는 신호에 기초하여 판단할 수 있다. 즉, 마이크(121)를 통해 사용자의 음성과 같이 입력되는 주변 잡음을 고려하여 소정 레벨 이상의 신호 입력 여부에 기초하여 사용자 음성 입력 여부를 판단할 수도 있고, 음성에 해당하는 소정 주파수 대역(예를들면, 300Hz 내지 3.4 kHz)의 신호를 검출함으로써 사용자의 음성 입력 여부를 판단할 수 있다. 또한, 오디오 코덱(125)에서 디코딩되는 데이터가 존재하는 경우에는 영상 통화 상대방의 음성이 수신된 것으로 판단될 수 있다.Here, the voice call control module 131 may determine that the voice of the user is input through the microphone 121 and that the voice of the video call counterpart is received based on the signal processed by the audio codec 125. That is, it is possible to determine whether the user's voice is input based on whether the user inputs a signal of a predetermined level or more in consideration of ambient noise input through the microphone 121 such as the user's voice, or a predetermined frequency band corresponding to the voice (for example, , 300Hz to 3.4 kHz) to detect the user's voice input. In addition, when data decoded by the audio codec 125 exists, it may be determined that the voice of the video call counterpart is received.

음성 통화 제어 모듈(131)은 영상 통화 상대방의 음성이 스피커(123)를 통해 출력되는 경우에 오디오 코덱(125)에서 처리된 데이터를 제공받아 버퍼(141)에 저장할 수 있다. When the voice of the video call counterpart is output through the speaker 123, the voice call control module 131 may receive data processed by the audio codec 125 and store the received data in the buffer 141.

여기서, 상기 오디오 코덱(125)에서 처리된 데이터는 마이크(121)를 통해 입력된 아날로그 형태의 사용자 음성을 디지털로 변환한 후 소정 코덱을 사용하여 압축한 데이터를 의미한다. 또한, 음성 통화 제어 모듈(131)은 사용자의 음성이 입력되는 경우에 오디오 코덱(125)에서 디코딩된 영상 통화 상대방의 음성을 버퍼(141)에 저장할 수 있다.Here, the data processed by the audio codec 125 refers to data that is converted using a predetermined codec after converting a user voice having an analog form input through the microphone 121 into digital. In addition, when the voice of the user is input, the voice call control module 131 may store the voice of the video call counterpart decoded by the audio codec 125 in the buffer 141.

또한, 음성 통화 제어 모듈(131)은 마이크(121)를 통해 입력되는 사용자의 음성 및 스피커(123)를 통해 출력되는 영상 통화 상대방의 음성에 대한 발음 구간(talk spurt)과 무음 구간(silence)을 모니터하고 무음 구간이 기설정된 소정 시간(예를 들면, 500ms) 이상인 경우에는 사용자의 음성 또는 영상 통화 상대방의 음성이 멈춘것으로 판단할 수 있다.In addition, the voice call control module 131 generates a talk spurt and a silence section for the voice of the user who is input through the microphone 121 and the voice of the video call counterpart output through the speaker 123. When monitoring and the silent section is more than a predetermined time (for example, 500ms), it can be determined that the voice of the user or the voice of the video call counterpart is stopped.

상기 음성 통화 제어 모듈(131)은 휴대용 단말기에 구비된 프로세서에 의해 실행될 수 있는 소프트웨어 프로그램으로 구현될 수도 있고, 별도의 독립된 하드웨어 칩 형태로 구현될 수도 있다.The voice call control module 131 may be implemented as a software program that may be executed by a processor included in the portable terminal, or may be implemented as a separate independent hardware chip.

저장부(140)는 플래쉬(Flash) 메모리, EEPROM(Electrically Erasable And Programmable Read Only Memory) 등과 같은 비휘발성 메모리로 구성될 수 있고, 휴대용 단말기의 기본 동작에 필요한 시스템 프로그램(예를 들면 운영체제) 및/또는 기타 응용프로그램을 저장한다.The storage unit 140 may be configured of a nonvolatile memory such as a flash memory, an electrically erasable and programmable read only memory (EEPROM), a system program (for example, an operating system) required for basic operation of the portable terminal, and / Or save other applications.

또한, 저장부(140)에는 사용자에 의해 생성된 데이터가 저장될 수 있고, 상 기 시스템 프로그램 및/또는 응용프로그램의 수행 중 발생되는 데이터가 저장될 수 있다.In addition, the storage 140 may store data generated by a user, and may store data generated during the execution of the system program and / or the application program.

특히, 저장부(140)에는 소정 크기의 버퍼(buffer)(141)가 구비되고, 상기 버퍼(141)에는 제어부(130)의 제어에 따라 오디오 코덱(125)으로부터 제공된 사용자의 음성 또는 영상 통화 상대방의 음성을 일시적으로 저장할 수 있다.In particular, the storage 140 is provided with a buffer 141 of a predetermined size, and the buffer 141 has a voice or video call counterpart of the user provided from the audio codec 125 under the control of the controller 130. You can temporarily store your voice.

키입력부(150)는 복수의 숫자, 문자 입력 키 및 특수 기능을 수행하기 위한 기능 키를 포함하고, 사용자에 의해 키조작이 발생하면 이에 상응하는 키입력 신호를 제어부(130)에 제공한다.The key input unit 150 includes a plurality of number, character input keys, and function keys for performing a special function, and when a key manipulation occurs by a user, the key input unit 150 provides a corresponding key input signal to the controller 130.

무선 송수신부(160)는 안테나(161)를 통하여 수신된 무선 고주파(RF: Radio Frequency) 신호를 베이스 밴드(baseband) 신호로 변환하여 제어부(130)에 제공하고, 제어부(130)로부터 제공되는 베이스 밴드 신호를 무선 고주파 신호로 변환하여 안테나(161)를 통해 출력한다.The wireless transceiver 160 converts a radio frequency (RF) signal received through the antenna 161 into a baseband signal and provides it to the controller 130, and the base provided from the controller 130. The band signal is converted into a radio frequency signal and output through the antenna 161.

특히, 무선 송수신부(160)는 비디오 코덱(117) 및 오디오 코덱(125)에 의해 소정의 영상 통화 포맷으로 인코딩된 초당 소정 프레임의 영상 통화 신호를 영상 통화 상대방의 휴대용 단말기에 송신하고, 영상 통화 상대방의 휴대용 단말기에서 전송한 영상 통화 신호를 수신하여 제어부(130)에 제공한다.In particular, the wireless transceiver 160 transmits a video call signal of a predetermined frame per second to the portable terminal of the video call counterpart by the video codec 117 and the audio codec 125 in a predetermined video call format. The video call signal transmitted from the other party's portable terminal is received and provided to the controller 130.

도 2는 본 발명의 일 실시예에 따른 영상 통화 중 음성 처리 과정을 나타내는 흐름도로서, 영상 통화 상대방의 음성이 스피커(123)를 통해 출력되고 있는 도중에 사용자의 음성이 입력될 때의 처리 과정을 나타낸다.2 is a flowchart illustrating an audio processing process of a video call according to an exemplary embodiment of the present invention, and illustrates a processing process when a voice of a user is input while a voice of a video call counterpart is being output through the speaker 123. .

도 2를 참조하면, 먼저 제어부(130)는 영상 통화를 지시하는 이벤트 신호가 제공되면 지정된 번호의 휴대용 단말기와 영상 통화 호를 연결한다.(단계 201). Referring to FIG. 2, first, the controller 130 connects a video call call to a portable terminal having a designated number when an event signal indicating a video call is provided (step 201).

이후, 마이크(121)를 통해 사용자(즉, 송화자)의 음성이 입력되면(단계 203), 제어부(130)는 영상 통화 상대방(즉, 수화자)의 음성이 스피커(123)를 통해 출력되고 있는지를 확인한다(단계 205). 여기서 상기 제어부(130)는 디코딩되는 영상 통화 상대방의 음성 데이터가 존재하면 영상 통화 상대방의 음성이 스피커를 통해 출력되는 것으로 판단할 수 있다.Then, when the voice of the user (ie, the talker) is input through the microphone 121 (step 203), the controller 130 determines whether the voice of the video call counterpart (ie, the talker) is being output through the speaker 123. Check (step 205). Here, the controller 130 may determine that the voice of the video call counterpart is output through the speaker when there is audio data of the video call counterpart to be decoded.

단계 205에서 판단결과, 현재 영상 통화 상대방의 음성이 출력되고 있는 것으로 판단되면, 제어부(130)는 마이크(121)를 통해 입력되는 사용자의 음성을 저장부(140)에 구비된 버퍼(141)에 저장한다(단계 207). 여기서, 제어부(130)는 오디오 코덱(125)으로부터 제공된 데이터 즉, 마이크를 통해 입력된 아날로그 형태의 사용자 음성이 디지털로 변환된 후 소정 코덱에 의해 압축된 데이터를 저장한다.If it is determined in step 205 that the voice of the current video call counterpart is being output, the controller 130 transmits the voice of the user input through the microphone 121 to the buffer 141 provided in the storage 140. Save (step 207). Here, the controller 130 stores the data provided by the audio codec 125, that is, the analog voice input through the microphone is converted into digital and then compressed by the predetermined codec.

이후, 제어부(130)는 영상 통화 상대방의 음성 출력이 멈추었는지를 판단하고(단계 209), 영상 통화 상대방의 음성 출력이 멈춘 것으로 판단되면 버퍼(141)에 저장된 사용자의 음성을 처리하여 영상 통화 상대방의 휴대용 단말기로 전송한다(단계 211).Thereafter, the controller 130 determines whether the voice output of the video call counterpart is stopped (step 209), and if it is determined that the voice output of the video call counterpart is stopped, the controller 130 processes the voice of the user stored in the buffer 141 to process the video call counterpart. It transmits to the portable terminal of (step 211).

여기서, 제어부(130)는 영상 통화 상대방의 음성에 대한 발음 구간(talk spurt)과 무음 구간(silence)을 모니터하고 무음 구간이 기설정된 소정 시간(예를 들면, 500ms) 이상인 경우에는 영상 통화 상대방의 음성 출력이 멈춘것으로 판단할 수 있다.Here, the controller 130 monitors the talk spurt and the silence section of the voice of the video call counterpart, and when the silent section is more than a predetermined time (for example, 500 ms), the video call counterpart It can be determined that the audio output is stopped.

단계 205에서 판단결과, 현재 영상 통화 상대방의 음성이 출력되지 않고 있 는 것으로 판단되면, 제어부(130)는 마이크(121)를 통해 입력되는 사용자의 음성을 실시간으로 처리하여 영상 통화 상대방의 휴대용 단말기에 전송한다(단계 213).If it is determined in step 205 that the voice of the video call counterpart is not currently output, the controller 130 processes the voice of the user input through the microphone 121 in real time to the portable terminal of the video call counterpart. Transmit (step 213).

이후, 제어부(130)는 영상 통화 종료를 지시하는 이벤트 신호가 입력되었는지를 판단하고(단계 215), 영상 통화 종료를 지시하는 이벤트 신호가 입력된 것으로 판단되면, 영상 통화 중 음성 처리 과정을 종료하고, 영상 통화 종료를 지시하는 이벤트 신호가 입력되지 않은 것으로 판단되면 단계 203으로 되돌아가서 이후의 단계를 순차적으로 수행한다.Thereafter, the controller 130 determines whether an event signal for instructing the end of the video call is input (step 215), and if it is determined that an event signal for instructing the end of the video call is input, the controller 130 ends the voice processing process during the video call. If it is determined that the event signal indicating the end of the video call is not input, the process returns to step 203 to sequentially perform the following steps.

도 3은 본 발명의 다른 실시예에 따른 영상 통화 중 음성 처리 방법을 나타내는 흐름도로서, 사용자의 음성이 스피커(123)를 통해 입력되는 도중에 영상 통화 상대방의 음성이 수신될 때의 처리 과정을 나타낸다.3 is a flowchart illustrating a voice processing method during a video call according to another embodiment of the present invention, and illustrates a processing process when a voice of a video call counterpart is received while a user's voice is input through the speaker 123.

도 3을 참조하면, 먼저 제어부(130)는 영상 통화를 지시하는 이벤트 신호가 제공되면 지정된 번호의 휴대용 단말기와 영상 통화 호를 연결한다(단계 301).Referring to FIG. 3, first, when the event signal indicating the video call is provided, the controller 130 connects the video call with the portable terminal of the designated number (step 301).

이후, 제어부(130)는 영상 통화 상대방의 음성이 수신되면(단계 303), 사용자의 음성이 입력되고 있는지를 판단한다(단계 305).Thereafter, when the voice of the video call counterpart is received (step 303), the controller 130 determines whether the voice of the user is input (step 305).

여기서, 제어부(130)는 마이크를 통해 사용자의 음성과 같이 입력되는 주변 잡음을 고려하여 소정 레벨 이상의 신호 입력 여부에 기초하여 사용자 음성 입력 여부를 판단할 수도 있고, 음성에 해당하는 소정 주파수 대역(예를들면, 300Hz 내지 3.4 kHz)의 신호를 검출함으로써 사용자의 음성 입력 여부를 판단할 수 있다.Here, the controller 130 may determine whether the user's voice is input based on whether the user inputs a signal of a predetermined level or more in consideration of ambient noise input through the microphone, such as the user's voice, or a predetermined frequency band corresponding to the voice. For example, it is possible to determine whether a user inputs a voice by detecting a signal of 300 Hz to 3.4 kHz.

단계 305에서 판단결과, 현재 사용자의 음성이 입력되고 있는 것으로 판단되면, 제어부(130)는 수신된 영상 통화 상대방의 음성을 저장부(140)에 구비된 버 퍼(141)에 저장한다(단계 307).If it is determined in step 305 that the voice of the user is currently input, the controller 130 stores the received video call counterpart voice in the buffer 141 provided in the storage 140 (step 307). ).

이후, 제어부(130)는 사용자의 음성 입력이 멈추었는지를 판단하고(단계 309), 사용자의 음성 입력이 멈춘 것으로 판단되면 버퍼(141)에 저장된 영상 통화 상대방의 음성을 처리하여 스피커(123)를 통해 출력한다(단계 311).Thereafter, the controller 130 determines whether the user's voice input is stopped (step 309), and if it is determined that the user's voice input is stopped, the controller 130 processes the voice of the video call counterpart stored in the buffer 141 to operate the speaker 123. Output via step 311.

여기서, 제어부(130)는 사용자의 음성 입력에 대한 발음 구간(talk spurt)과 무음 구간(silence)을 모니터하고 무음 구간이 기설정된 소정 시간(예를 들면, 500ms) 이상인 경우에는 사용자의 음성 입력이 멈춘것으로 판단할 수 있다.Herein, the controller 130 monitors a talk spurt and a silence section for the user's voice input, and if the voice section is longer than a predetermined time (for example, 500 ms), the user's voice input is You can judge it as stopped.

단계 305에서 판단결과, 현재 사용자의 음성이 입력되고 있지 않은 것으로 판단되면, 제어부(130)는 수신된 영상 통화 상대방의 음성을 처리하여 실시간으로 스피커(123)를 통해 출력한다(단계 313).If it is determined in step 305 that the voice of the user is not currently input, the controller 130 processes the received voice of the video call counterpart and outputs it through the speaker 123 in real time (step 313).

이후, 제어부(130)는 영상 통화 종료를 지시하는 이벤트 신호가 입력되었는지를 판단하고(단계 315), 영상 통화 종료를 지시하는 이벤트 신호가 입력된 것으로 판단되면, 영상 통화 중 음성 처리 과정을 종료하고, 영상 통화 종료를 지시하는 이벤트 신호가 입력되지 않은 것으로 판단되면 단계 303으로 되돌아가서 이후의 단계를 순차적으로 수행한다.Thereafter, the controller 130 determines whether an event signal for instructing the end of the video call is input (step 315), and if it is determined that an event signal for instructing the end of the video call is input, the controller 130 ends the voice processing process during the video call. If it is determined that the event signal indicating the end of the video call is not input, the process returns to step 303 and the subsequent steps are sequentially performed.

이상 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described with reference to the embodiments above, those skilled in the art will understand that the present invention can be variously modified and changed without departing from the spirit and scope of the invention as set forth in the claims below. Could be.

도 2는 본 발명의 일 실시예에 따른 영상 통화 중 음성 처리 방법을 나타내는 흐름도이다.2 is a flowchart illustrating a voice processing method during a video call according to an embodiment of the present invention.

도 3은 본 발명의 다른 실시예에 따른 영상 통화 중 음성 처리 방법을 나타내는 흐름도이다.3 is a flowchart illustrating a voice processing method during a video call according to another exemplary embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

111 : 카메라부 113 : 디스플레이부111 camera portion 113 display unit

115 : 그래픽 처리부 117 : 비디오 코덱115: graphics processor 117: video codec

121 : 마이크 123 : 스피커121: microphone 123: speaker

125 : 오디오 코덱 130 : 제어부125: audio codec 130: control unit

131 : 음성 통화 제어 모듈 140 : 저장부131: voice call control module 140: storage unit

141 : 버퍼 150 : 키입력부141: buffer 150: key input unit

160 : 무선송수신부160: wireless transmission and reception unit

Claims

In the video call method,

Temporarily storing the voice of the input caller when the voice of the caller is input while the voice of the caller is being output; And

And transmitting the voice of the temporarily stored caller to the called party after the voice output of the called party is stopped.

The method of claim 1, wherein when the voice of the talker is input while the voice of the talker is being output, temporarily storing the input caller's voice,

And determining whether to output the voice of the called party based on the presence or absence of the decoded voice data.

The method of claim 1, wherein if the voice of the talker is input while the voice of the talker is being output, the step of temporarily storing the input caller's voice,

And converting the talker's voice into digital data, and then compressing the converted digital data using a predetermined codec.

The method of claim 1, wherein after the voice output of the called party is stopped, transmitting the temporarily stored caller's voice to the called party,

And determining that the voice output of the called party is stopped when the silent section of the called party is more than a predetermined time.

In the video call method,

Temporarily storing the voice of the called party when the voice of the called party is received while the voice of the talker is being input; And

And outputting the temporarily stored voice of the temporarily stored caller after the voice input of the caller is stopped.

The method of claim 5, wherein when the voice of the called party is received while the voice of the caller is being input, temporarily storing the received voice of the called party,

And determining whether a voice is input by the caller using any one method of detecting a signal input of a predetermined level or more and detecting a signal in a frequency band corresponding to a voice.

And after the voice of the called party is decoded, temporarily storing the decoded voice of the called party.

The method of claim 5, wherein after the voice input of the caller is stopped, the voice of the temporarily stored caller is output.

And determining that the voice input of the caller is stopped when the silent section of the caller's voice is more than a predetermined time.

In a portable terminal having a video call function,

If the voice of the talker is input while the voice of the talker is being output, the voice of the caller is temporarily transmitted to the talker after the voice of the talker is temporarily stored after the voice of the talker is stopped. A controller for temporarily storing a voice of the called party and receiving the temporarily stored voice of the called party after the voice of the called party is stopped when the voice of the called party is received; And

And a storage unit having a buffer for temporarily storing at least one of the talker's voice and the talker's voice.

The method of claim 9, wherein the control unit

A portable terminal having a voice processing function during a video call, characterized in that it is determined whether or not the caller's voice is output based on the presence or absence of decoded voice data.

The method of claim 9, wherein the control unit

A portable terminal having a voice processing function of a video call, characterized in that it is determined whether or not the caller's voice is input by using any one of a method of detecting a signal input of a predetermined level or more and a signal of a frequency band corresponding to a voice.

The portable terminal of claim 9, wherein the portable terminal has a voice processing function during the video call.

And a audio codec for converting the voice of the talker into digital data and then compressing the converted digital data using a predetermined codec.

The method of claim 9, wherein the control unit

It is determined that the voice output of the talker is stopped when the silent section of the talker's voice is more than a predetermined time, and it is determined that the voice input of the talker is stopped when the silent section of the talker's voice is more than the predetermined time. A portable terminal having a voice processing function during a video call.