KR101626479B1

KR101626479B1 - Apparatus and method for transmitting and receiving voice packet

Info

Publication number: KR101626479B1
Application number: KR1020150052558A
Authority: KR
Inventors: 강인규
Original assignee: 라인 가부시키가이샤
Priority date: 2015-04-14
Filing date: 2015-04-14
Publication date: 2016-06-01

Abstract

The present invention provides an apparatus and a method for transmitting and receiving a voice packet, capable of adjusting a transmission bandwidth or time of the voice packet by using feedback information. According to one embodiment of the present invention, the apparatus comprises: an input unit inputting a first voice data received from a user; a segmenting unit segmenting the first voice data into each predetermined unit; a first processing unit dividing the segmented first voice data into each pitch unit, detecting an overlapped pitch similar to a neighboring pitch included in the first voice data, and removing the overlapped pitch to reduce the length of the first voice data; a second processing unit coding the first voice data from which the overlapped pitch is removed to generate a first voice packet; a third processing unit encrypting the first voice packet by using a security key; and a communication unit transmitting the encrypted first voice packet to a voice packet transmitting and receiving apparatus of a receiver side.

Description

[0001] APPARATUS AND METHOD FOR TRANSMITTING AND RECEIVING VOICE PACKET [0002]

본 발명은 음성 패킷 송수신 장치 및 방법에 관한 것으로, 보다 구체적으로는 수신자 측 음성 패킷 송수신 장치로부터의 피드백 정보를 이용하여, 음성 패킷의 전송 대역폭 및 전송 시간을 변경하는 음성 패킷 송수신 장치 및 방법에 관한 것이다. BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus and a method for transmitting and receiving a voice packet, and more particularly, to a voice packet transmitting and receiving apparatus and method for changing a transmission bandwidth and a transmission time of a voice packet using feedback information from a receiver- will be.

VoIP(Voice over Internet Protocol)는 기존의 통신망인 PSTN(Public Switch Telephone Network)을 통하여 음성(voice) 통화를 하는 것이 아니라 인터넷 망을 통하여 음성 통화를 하는 새로운 방식의 통신 서비스이다. 인터넷 망을 이용한 통화 방식은 기존의 방식과 달리 패킷 기반 네트워크(packet-based network)를 사용하기 때문에 국내/국제 전화선 사용료를 지불하지 않아 보다 저렴하게 음성 통화를 할 수 있다. VoIP는 ITU-T(International Telecommunications Union - Telecommunications standardization sector) 표준인 H.323 프로토콜을 사용하여 음성(audio)뿐만 아니라 영상(video) 정보까지도 보낼 수 있다.The Voice over Internet Protocol (VoIP) is a new type of communication service in which a voice call is made through an Internet network instead of a voice call through a public switch telephone network (PSTN). Unlike the conventional method, the call using the Internet network uses a packet-based network. Therefore, it is possible to make a voice call at a lower cost without paying a fee for using the domestic / international telephone line. VoIP can send not only audio but also video information using the H.323 protocol, which is the International Telecommunications Union - Telecommunications standardization sector (ITU-T) standard.

국내 공개특허공보 제1984-0005793호Korean Patent Publication No. 1984-0005793

본 발명의 실시예들은 수신자 측 음성 패킷 송수신 장치로부터의 피드백 정보를 이용하여, 음성 패킷의 전송 대역폭 또는 전송 시간을 조정하는 음성 패킷 송수신 장치, 방법을 제공한다. Embodiments of the present invention provide a method and apparatus for transmitting and receiving a voice packet that adjusts a transmission bandwidth or transmission time of a voice packet using feedback information from a receiver-side voice packet transmission / reception device.

본 발명의 실시예에 따른 음성 패킷 송수신 장치는 사용자로부터 제1 음성 데이터를 입력하는 입력부; 상기 제1 음성 데이터를 소정의 단위로 분절하는 분절부; 분절한 제1 음성 데이터를 피치 단위로 분할하고, 상기 제1 음성 데이터에 포함된 인접 피치와 유사한 중복 피치를 검출하고, 상기 중복 피치를 제거함으로써, 상기 제1 음성 데이터의 길이를 감소시키는 제1 처리부; 상기 중복 피치를 제거한 제1 음성 데이터를 부호화하여 제1 음성 패킷을 생성하는 제2 처리부; 상기 제1 음성 패킷을 보안 키를 이용하여 암호화하는 제3 처리부; 상기 암호화한 제1 음성 패킷을 수신자 측 음성 패킷 송수신 장치로 전송하는 통신부;를 포함할 수 있다. According to another aspect of the present invention, there is provided a voice packet transmission / reception apparatus including: an input unit for inputting first voice data from a user; A segmentation unit for segmenting the first speech data into predetermined units; Wherein the first audio data is segmented in units of pitches, the overlapping pitches similar to the adjacent pitches included in the first audio data are detected, and the overlapping pitches are removed, thereby reducing the length of the first audio data A processor; A second processing unit for generating a first speech packet by encoding the first speech data from which the overlapping pitch is removed; A third processor for encrypting the first speech packet using a secret key; And a communication unit for transmitting the encrypted first voice packet to a receiver-side voice packet transmission / reception apparatus.

본 실시예에 따르면, 상기 통신부는 상기 수신자 측 음성 패킷 송수신 장치로부터 제2 음성 패킷을 수신하고, 본 발명의 실시예에 따른 음성 패킷 송수신 장치는 상기 제2 음성 패킷의 길이와 대응되는 패킷 시간과 기 설정된 기준 시간 사이를 비교함으로써, 음성 패킷의 대역폭 또는 전송 시간에 대한 증감 여부를 결정하는 수신 제어부;를 더 포함할 수 있다. According to the present embodiment, the communication unit receives the second voice packet from the receiver-side voice packet transmission / reception apparatus, and the voice packet transmission / reception apparatus according to the embodiment of the present invention compares the packet time corresponding to the length of the second voice packet, And a reception controller for determining whether to increase or decrease the bandwidth or transmission time of the voice packet by comparing the predetermined reference time.

본 실시예에 따르면, 상기 수신 제어부는 상기 패킷 시간이 상기 기준 시간 미만인 경우, 상기 음성 패킷의 대역폭 또는 전송 시간을 감소시키도록 결정하고, 상기 패킷 시간이 상기 기준 시간 초과인 경우, 상기 음성 패킷의 대역폭 또는 전송 시간을 증가시키도록 결정할 수 있다. According to the present embodiment, the reception controller determines to decrease the bandwidth or transmission time of the voice packet when the packet time is less than the reference time, and when the packet time exceeds the reference time, Bandwidth, or transmission time.

본 실시예에 따르면, 상기 제2 처리부는 상기 음성 패킷의 대역폭 또는 전송 시간에 대한 증감 여부에 따라, 전송 대역폭을 증가시키거나, 감소시키거나, 유지하고, 상기 전송 대역폭에 따라 상기 음성 데이터를 부호화 할 수 있다. According to this embodiment, the second processing unit may increase, decrease, or hold the transmission bandwidth depending on whether the voice packet is increased or decreased with respect to the bandwidth or transmission time, and encode the voice data according to the transmission bandwidth can do.

본 실시예에 따르면, 상기 제1 처리부는 상기 음성 패킷의 대역폭 또는 전송 시간에 대한 증감 여부에 따라, 상기 음성 데이터에 포함되는 중복 피치의 제거 비율을 결정하고, 상기 제거 비율에 따라 상기 음성 데이터에 포함된 중복 피치를 제거할 수 있다. According to the present embodiment, the first processing unit may determine a removal ratio of a redundant pitch included in the voice data according to the increase or decrease in the bandwidth or the transmission time of the voice packet, The overlapping pitch included can be eliminated.

본 실시예에 따르면, 상기 제1 처리부는 상기 음성 데이터에 포함된 제1 피치 및 상기 제1 피치와 인접한 피치 단위인 제2 피치 사이의 유사도를 판단하고, 상기 유사도가 기 설정된 임계비율 이상인 경우, 상기 제2 피치를 중복 피치로 판단하고, 상기 중복 피치를 제거할 수 있다. According to this embodiment, the first processing unit may determine a similarity between a first pitch included in the voice data and a second pitch that is a pitch unit adjacent to the first pitch, and when the similarity is equal to or greater than a predetermined threshold ratio, The second pitch may be determined as a redundant pitch, and the redundant pitch may be eliminated.

본 실시예에 따르면, 상기 제1 처리부는 상기 음성 데이터에 포함된 제1 피치가 모음과 대응되는 경우, 상기 제1 피치 및 상기 제1 피치의 이전 피치 단위인 제2 피치 사이의 유사도를 판단하고, 상기 유사도가 기 설정된 임계비율 이상인 경우, 상기 제2 피치를 중복 피치로 판단하고, 상기 중복 피치를 제거할 수 있다. According to this embodiment, when the first pitch included in the voice data corresponds to the vowel, the first processing unit determines the similarity between the first pitch and the second pitch that is the previous pitch unit of the first pitch If the similarity is equal to or greater than a predetermined threshold ratio, the second pitch may be determined as a redundant pitch, and the redundant pitch may be removed.

본 발명의 실시예에 따른 음성 패킷 송수신 방법은 입력부가 사용자로부터 제1 음성 데이터를 입력하는 단계; 제어부가 상기 제1 음성 데이터를 소정의 단위로 분절하는 단계; 상기 제어부가 분절한 제1 음성 데이터를 피치 단위로 분할하고, 상기 제1 음성 데이터에 포함된 인접 피치와 유사한 중복 피치를 검출하고, 상기 중복 피치를 제거함으로써, 상기 제1 음성 데이터의 길이를 감소시키는 단계; 상기 제어부가 상기 중복 피치를 제거한 제1 음성 데이터를 부호화하여 제1 음성 패킷을 생성하는 단계; 상기 제어부가 상기 제1 음성 패킷을 보안 키를 이용하여 암호화하는 단계; 통신부가 상기 암호화한 제1 음성 패킷을 수신자 측 음성 패킷 송수신 장치로 전송하는 단계;를 포함할 수 있다. A method of transmitting and receiving a voice packet according to an exemplary embodiment of the present invention includes: inputting first voice data from an input unit; The control unit separating the first audio data into predetermined units; The controller divides the first audio data segmented by the control unit into pitches, detects overlapping pitches similar to the adjacent pitches included in the first audio data, and removes the overlapping pitches to reduce the length of the first audio data ; Generating a first speech packet by encoding the first speech data from which the overlapping pitch is removed; Encrypting the first voice packet using the secret key; And the communication unit transmits the encrypted first voice packet to the receiver-side voice packet transmission / reception apparatus.

본 발명의 실시예에 따른 컴퓨터 프로그램은 컴퓨터를 이용하여 본 발명의 실시예들에 따른 음성 패킷 송수신 방법 중 어느 하나의 방법을 실행시키기 위하여 매체에 저장될 수 있다. A computer program according to an embodiment of the present invention can be stored in a medium using a computer to execute any one of the voice packet transmission and reception methods according to the embodiments of the present invention.

이 외에도, 본 발명을 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공된다. In addition to this, another method for implementing the present invention, another system, and a computer-readable recording medium for recording a computer program for executing the method are further provided.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해 질 것이다.Other aspects, features, and advantages other than those described above will become apparent from the following drawings, claims, and the detailed description of the invention.

이와 같은 본 발명에 의해서, 수신자 측 음성 패킷 송수신 장치로부터의 피드백 정보를 이용하여, 음성 패킷의 전송 대역폭 또는 전송 시간을 조정할 수 있다. According to the present invention as described above, the transmission bandwidth or the transmission time of the voice packet can be adjusted by using the feedback information from the receiver-side voice packet transmission / reception device.

도 1은 본 발명의 실시예에 따른 음성 패킷 송수신 시스템을 나타내는 도면이다.
도 2는 본 발명의 실시예에 따른 음성 패킷 송수신 장치를 나타내는 도면이다.
도 3은 제어부(110)의 구조를 설명하는 도면이다.
도 4는 발신 제어부(111)의 구조를 설명하는 도면이다.
도 5는 수신 제어부(112)의 구조를 설명하는 도면이다.
도 6 내지 8은 본 발명의 실시예들에 따른 음성 패킷 송수신 방법을 나타내는 도면이다.
도 9는 수신자 측 음성 패킷 송수신 장치 및 발신자 측 음성 패킷 송수신 장치 사이의 데이터 송수신 절차를 설명하기 위한 흐름도이다. 1 is a block diagram of a voice packet transmission / reception system according to an embodiment of the present invention.
2 is a block diagram of a voice packet transceiver according to an embodiment of the present invention.
3 is a diagram for explaining the structure of the control unit 110. As shown in FIG.
4 is a diagram for explaining the structure of the call control unit 111. [
5 is a diagram for explaining the structure of the reception control section 112. [
6 to 8 are views illustrating a method of transmitting and receiving a voice packet according to embodiments of the present invention.
9 is a flowchart illustrating a data transmission / reception procedure between a receiver-side voice packet transmission / reception device and a caller-side voice packet transmission / reception device.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. BRIEF DESCRIPTION OF THE DRAWINGS The present invention is capable of various modifications and various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. The effects and features of the present invention and methods of achieving them will be apparent with reference to the embodiments described in detail below with reference to the drawings. However, the present invention is not limited to the embodiments described below, but may be implemented in various forms.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals refer to like or corresponding components throughout the drawings, and a duplicate description thereof will be omitted .

이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. In the following embodiments, the terms first, second, and the like are used for the purpose of distinguishing one element from another element, not the limitative meaning.

이하의 실시예에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는한, 복수의 표현을 포함한다. In the following examples, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise.

이하의 실시예에서, 포함하다 또는 가지다 등의 용어는 명세서 상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징을 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. In the following embodiments, terms such as inclusive or possessed mean that a feature or element described in the specification is present, and does not exclude the possibility that one or more other features or components are added in advance.

어떤 실시예가 달리 구현 가능한 경우에 특정한 공정 순서는 설명되는 순서와 다르게 수행될 수도 있다. 예를 들어, 연속하여 설명되는 두 공정이 실질적으로 동시에 수행될 수도 있고, 설명되는 순서와 반대의 순서로 진행될 수 있다. If certain embodiments are otherwise feasible, the particular process sequence may be performed differently from the sequence described. For example, two processes that are described in succession may be performed substantially concurrently, and may be performed in the reverse order of the order described.

도 1은 본 발명의 실시예에 따른 음성 패킷 송수신 시스템(10)을 나타내는 도면이다. 1 is a diagram showing a voice packet transmission / reception system 10 according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 음성 패킷 송수신 시스템(10)은 음성 패킷 송수신 장치(100, 200), 및 통신망(300)을 포함할 수 있다. Referring to FIG. 1, a voice packet transmission / reception system 10 according to an embodiment of the present invention may include a voice packet transmission / reception apparatus 100, 200, and a communication network 300.

음성 패킷 송수신 장치(100, 200)는 무선 통신 환경에서 지정한 사용자들의 음성 패킷 송수신 장치 사이의 음성 데이터를 송수신하는 수단을 제공한다. 음성 패킷 송수신 장치(100, 200)는 음성 통화 연결 요청을 한 발신자와 수신자 사이의 음성 통화를 하는데 있어서, 음성 패킷을 TCP/IP 통신 프로토콜을 이용하여 데이터를 송수신할 수 있다. 또한, 음성 패킷 송수신 장치(100, 200)는 발신자와 수신자 사이의 유동적인 무선 통신 환경을 고려하기 위해서, 수신자는 음성 패킷 송수신과 관련된 피드백 정보를 발신자에게 전송하고, 발신자는 수신한 피드백 정보를 고려하여, 음성 패킷의 부호화 방법 또는 압축 방법을 조정할 수 있다. 즉, 음성 패킷 송수신 장치(100, 200)는 수신자로부터 수신한 피드백 정보를 이용하여, 음성 패킷 송수신에 사용할 리소스를 감소 시킬 수 있다. 좀더 구체적으로, 음성 패킷 송수신 장치(100, 200)는 수신자로부터 수신한 피드백 정보를 이용하여, 음성 패킷의 전송 대역폭 또는 전송 시간을 감소 또는 증가 시킬 수 있다. The voice packet transmission / reception apparatuses 100 and 200 provide means for transmitting / receiving voice data between voice packet transmission / reception apparatuses designated by users in a wireless communication environment. The voice packet transmission / reception devices 100 and 200 can transmit / receive voice packets using a TCP / IP communication protocol in voice communication between a caller and a recipient who make a voice call connection request. In order to consider a flexible wireless communication environment between the sender and the receiver, the receiver transmits the feedback information related to the voice packet transmission / reception to the sender, and the sender considers the received feedback information Thus, the encoding method or the compression method of the speech packet can be adjusted. That is, the voice packet transmission / reception apparatuses 100 and 200 can reduce resources to be used for voice packet transmission / reception by using the feedback information received from the receiver. More specifically, the voice packet transmission / reception apparatuses 100 and 200 can reduce or increase the transmission bandwidth or the transmission time of the voice packet by using the feedback information received from the receiver.

선택적으로, 음성 패킷 송수신 장치(100, 200)는 복수의 사용자 사이의 음성 데이터를 송수신하는 수단을 제공할 수 있다. 음성 패킷 송수신 장치(100, 200)는 3 이상의 사용자 중 제1 사용자로부터 수신한 음성 데이터를 제1 사용자를 제외한 나머지 사용자들의 단말기로 송신하는 수단을 제공할 수 있다. 이런 경우에도 마찬가지로, 복수의 수신자 단말기로부터 수신한 피드백 정보를 이용하여, 발신자의 음성 패킷 송수신 장치(100, 200)는 음성 패킷을 부호화하거나 압축하게 된다. Alternatively, the voice packet transmission / reception apparatuses 100 and 200 may provide a means for transmitting / receiving voice data between a plurality of users. The voice packet transmission / reception apparatuses 100 and 200 may provide means for transmitting voice data received from a first user among three or more users to terminals of the remaining users except for the first user. In this case as well, by using the feedback information received from the plurality of receiver terminals, the voice packet transmission / reception apparatuses 100 and 200 of the sender encode or compress voice packets.

종래의 음성 패킷 송수신 장치는 네트워크 환경이 열악한 경우, 네트워크 환경을 고려한 낮은 대역폭으로 음성 패킷을 송수신함으로써, 수신자는 음성 데이터의 일부 또는 전부가 손실된 음성 패킷을 수신 받게 되고, 무선 통신망을 이용하여 원활한 음성 통화를 할 수 없었다. In a conventional voice packet transmission / reception device, when a network environment is poor, a voice packet is transmitted / received with a low bandwidth considering a network environment, so that a receiver receives a voice packet in which a part or all of voice data is lost, I could not make a voice call.

복수 개의 음성 패킷 송수신 장치(100, 200)들은 유무선 통신 환경에서 웹 서비스를 이용할 수 있는 통신 단말기를 의미한다. 여기서, 음성 패킷 송수신 장치(100, 200)는 사용자의 퍼스널 컴퓨터(201)일 수도 있고, 또는 사용자의 휴대용 단말(202)일 수도 있다. The plurality of voice packet transmission / reception devices 100 and 200 refers to a communication terminal that can use a web service in a wired / wireless communication environment. Here, the voice packet transmission / reception apparatuses 100 and 200 may be a personal computer 201 of a user or a portable terminal 202 of a user.

이를 더욱 상세히 설명하면, 음성 패킷 송수신 장치(100, 200)는 컴퓨터(예를 들면, 데스크톱, 랩톱, 태블릿 등), 미디어 컴퓨팅 플랫폼(예를 들면, 케이블, 위성 셋톱박스, 디지털 비디오 레코더), 핸드헬드 컴퓨팅 디바이스(예를 들면, PDA, 이메일 클라이언트 등), 핸드폰의 임의의 형태, 또는 다른 종류의 컴퓨팅 또는 커뮤니케이션 플랫폼의 임의의 형태를 포함할 수 있으나, 본 발명이 이에 한정되는 것은 아니다. To be more specific, the voice packet transmission / reception apparatuses 100 and 200 may be connected to a computer (e.g., a desktop, a laptop, a tablet, etc.), a media computing platform (e.g., a cable, a satellite set- (E.g., a PDA, an email client, etc.), any form of mobile phone, or any other type of computing or communication platform, but the invention is not so limited.

한편, 통신망(300)은 복수 개의 음성 패킷 송수신 장치(100, 200)들 사이를 연결하는 역할을 수행한다. 즉, 통신망(300)은 복수 개의 음성 패킷 송수신 장치(100, 200) 중 제1 장치 및 제2 장치 사이의 통신 채널을 확립한 후, 음성 데이터를 송수신할 수 있도록 접속 경로를 제공하는 통신망을 의미한다. 통신망(300)은 예컨대 LANs(Local Area Networks), WANs(Wide Area Networks), MANs(Metropolitan Area Networks), ISDNs(Integrated Service Digital Networks) 등의 유선 네트워크나, 무선 LANs, CDMA, 블루투스, 위성 통신 등의 무선 네트워크를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. Meanwhile, the communication network 300 plays a role of connecting between the plurality of voice packet transmission / reception devices 100 and 200. That is, the communication network 300 means a communication network that provides a connection path so that voice data can be transmitted and received after establishing a communication channel between the first device and the second device among the plurality of voice packet transmission / reception devices 100 and 200 do. The communication network 300 may be a wired network such as LANs (Local Area Networks), WANs (Wide Area Networks), MANs (Metropolitan Area Networks), ISDNs (Integrated Service Digital Networks), wireless LANs, CDMA, Bluetooth, But the scope of the present invention is not limited thereto.

도 2는 본 발명의 실시예에 따른 음성 패킷 송수신 장치(100, 200)을 나타내는 블록도이다. 2 is a block diagram illustrating a voice packet transmission / reception apparatus 100, 200 according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 실시예에 따른 음성 패킷 송수신 장치(100, 200)는 제어부(110), 통신부(120), 표시부(130), 입력부(140), 저장부(150), 음향 출력부(160)를 포함할 수 있다. Referring to FIG. 2, the apparatus for transmitting and receiving voice packets 100 and 200 according to the embodiment of the present invention includes a control unit 110, a communication unit 120, a display unit 130, an input unit 140, a storage unit 150, And an output unit 160.

제어부(110)는 입력부(140)를 통해, 입력된 음성 데이터를 음성 패킷으로 변환하여, 수신자 측 음성 패킷 송수신 장치(100, 200)로 송신하는 기능을 제공한다. 제어부(110)는 음성 패킷의 전송 시간 또는 전송 대역폭을 감소시키기 위해서, 입력부(140)를 통해, 입력된 음성 데이터를 피치 단위로 구분한 후, 중복되는 피치를 제거할 수 있다. 또한, 제어부(110)는 수신측 음성 패킷 송수신 장치(100, 200) 이외의 장치로 전달되는 것을 방지하기 위해서, 음성 패킷을 암호화한 후, 암호화한 음성 패킷을 전송할 수도 있고, 실제 암호키를 가진 수신자 만이 음성 패킷을 복호화할 수 있게 된다. The control unit 110 provides a function of converting the inputted voice data into a voice packet through the input unit 140 and transmitting the voice packet to the recipient voice packet transmission / reception apparatuses 100 and 200. In order to reduce the transmission time or the transmission bandwidth of the voice packet, the control unit 110 may divide the input voice data into pitch units through the input unit 140, and then remove the overlapping pitches. In order to prevent the control unit 110 from being transmitted to a device other than the receiving-side voice packet transmitting / receiving apparatuses 100 and 200, the control unit 110 may encrypt the voice packet and then transmit the encrypted voice packet, Only the receiver can decode the voice packet.

제어부(110)는 수신자 측 음성 패킷 송수신 장치(100, 200)로부터 수신한 피드백 정보를 고려하여, 음성 패킷의 부호화 방법, 또는 음성 패킷의 압축 정도를 결정할 수 있다. 좀더 구체적으로 제어부(110)는 음성 패킷의 전송 대역폭 또는 전송 시간을 감소 또는 증가시킬 수 있다. 여기서, 수신자는 단수 이거나 복수 일 수 있다. The control unit 110 can determine the method of encoding the voice packet or the degree of compression of the voice packet in consideration of the feedback information received from the receiver-side voice packet transmission / reception apparatuses 100 and 200. More specifically, the control unit 110 may reduce or increase the transmission bandwidth or transmission time of the voice packet. Here, the recipient may be singular or plural.

여기서, 제어부(110)는 프로세서(processor)와 같이 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.Here, the control unit 110 may include any kind of device capable of processing data, such as a processor. Herein, the term " processor " may refer to a data processing apparatus embedded in hardware, for example, having a circuit physically structured to perform a function represented by a code or an instruction contained in the program. As an example of the data processing apparatus built in hardware, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated circuit (ASIC) circuit, and a field programmable gate array (FPGA), but the scope of the present invention is not limited thereto.

통신부(120)는 수신자 측 음성 패킷 송수신 장치(100, 200)와 송신자 측 음성 패킷 송수신 장치(100, 200) 간의 통신을 가능하게 하는 하나 이상의 구성요소를 포함할 수 있다. 예를 들어, 통신부(120)는, 근거리 통신부, 이동 통신부를 포함할 수 있다. 근거리 통신부(short-range wireless communication unit)는, 블루투스 통신부, BLE(Bluetooth Low Energy) 통신부, 근거리 무선 통신부(Near Field Communication unit), WLAN(와이파이) 통신부, 지그비(Zigbee) 통신부, 적외선(IrDA, infrared Data Association) 통신부, WFD(Wi-Fi Direct) 통신부, UWB(ultra wideband) 통신부, Ant+ 통신부 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 이동 통신부는, 이동 통신망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신한다. 여기에서, 무선 신호는, 음성 호 신호, 화상 통화 호 신호 또는 문자/멀티미디어 메시지 송수신에 따른 다양한 형태의 데이터를 포함할 수 있다. The communication unit 120 may include one or more components that enable communication between the receiver-side voice packet transmission / reception apparatuses 100 and 200 and the transmitter-side voice packet transmission / reception apparatuses 100 and 200. For example, the communication unit 120 may include a local communication unit and a mobile communication unit. A short-range wireless communication unit includes a Bluetooth communication unit, a Bluetooth low energy (BLE) communication unit, a near field communication unit, a WLAN communication unit, a Zigbee communication unit, Data Association) communication unit, a WFD (Wi-Fi Direct) communication unit, an UWB (ultra wideband) communication unit, an Ant + communication unit, and the like. The mobile communication unit transmits and receives radio signals to at least one of a base station, an external terminal, and a server on a mobile communication network. Here, the wireless signal may include various types of data depending on a voice call signal, a video call signal, or a text / multimedia message transmission / reception.

표시부(130)는 음성 데이터의 송수신을 제공하는 애플리케이션의 사용자 인터페이스를 표시할 수 있다. 한편, 표시부(130)와 터치패드가 레이어 구조를 이루어 터치 스크린으로 구성되는 경우, 표시부(130)는 출력 장치 이외에 입력 장치로도 사용될 수 있다. 표시부(430)는 액정 디스플레이(liquid crystal display), 박막 트랜지스터 액정 디스플레이(thin film transistor-liquid crystal display), 유기 발광 다이오드(organic light-emitting diode), 플렉시블 디스플레이(flexible display), 3차원 디스플레이(3D display), 전기영동 디스플레이(electrophoretic display) 중에서 적어도 하나를 포함할 수 있다. The display unit 130 may display a user interface of an application that provides transmission and reception of voice data. Meanwhile, when the display unit 130 and the touch pad have a layer structure and are configured as a touch screen, the display unit 130 may be used as an input device in addition to the output device. The display unit 430 may be a liquid crystal display, a thin film transistor-liquid crystal display, an organic light-emitting diode, a flexible display, a three-dimensional display display, and electrophoretic display.

입력부(140)는 음성 패킷 송수신 장치(100, 200)를 제어하기 위한 데이터를 입력하는 수단을 의미한다. 예를 들어, 입력부(140)에는 키 패드(key pad), 돔 스위치 (dome switch), 터치 패드(접촉식 정전 용량 방식, 압력식 저항막 방식, 적외선 감지 방식, 표면 초음파 전도 방식, 적분식 장력 측정 방식, 피에조 효과 방식 등), 조그 휠, 조그 스위치 등이 있을 수 있으나 이에 한정되는 것은 아니다.The input unit 140 means a means for inputting data for controlling the voice packet transmission / reception apparatuses 100 and 200. For example, the input unit 140 may include a key pad, a dome switch, a touch pad (a contact type capacitance type, a pressure type resistive type, an infrared detection type, a surface ultrasonic wave conduction type, A measurement method, a piezo effect method, etc.), a jog wheel, a jog switch, and the like, but is not limited thereto.

입력부(140)는, 사용자 입력을 획득할 수 있다. 예를 들어, 입력부(140)는, 음성 패킷 송수신을 위한 애플리케이션에 대한 사용자 이벤트, 스크롤 입력, 방향키 입력, 소정의 방향을 가지는 움직임이 있는 터치 입력을 획득할 수 있다. The input unit 140 can acquire a user input. For example, the input unit 140 may acquire a user input, a scroll input, a direction key input, and a motion input with a predetermined direction, for an application for transmitting and receiving a voice packet.

입력부(140)는 오디오 신호 또는 비디오 신호 입력을 위한 것으로, 이에는 카메라와 마이크로폰 등이 포함될 수 있다. 카메라는 화상 통화모드 또는 촬영 모드에서 이미지 센서를 통해 정지영상 또는 동영상 등의 화상 프레임을 얻을 수 있다. 이미지 센서를 통해 캡쳐된 이미지는 제어부(110) 또는 별도의 이미지 처리부를 통해 처리될 수 있다. 마이크로폰은, 외부의 음향 신호를 입력 받아 전기적인 음성 데이터로 처리한다. 예를 들어, 마이크로폰은 외부 디바이스 또는 화자로부터 음향 신호를 수신할 수 있다. 마이크로폰은 외부의 음향 신호를 입력 받는 과정에서 발생되는 잡음을 제거하기 위한 다양한 잡음 제거 알고리즘을 이용할 수 있다. The input unit 140 is for inputting an audio signal or a video signal, and may include a camera, a microphone, and the like. The camera can obtain an image frame such as a still image or a moving image through the image sensor in the video communication mode or the photographing mode. The image captured through the image sensor can be processed through the control unit 110 or a separate image processing unit. The microphone receives an external acoustic signal and processes it as electrical voice data. For example, the microphone may receive acoustic signals from an external device or speaker. The microphone can use various noise cancellation algorithms to remove the noise generated in receiving the external sound signal.

저장부(150)는 제어부(110)의 처리 및 제어를 위한 프로그램을 저장할 수도 있고, 입/출력되는 데이터들(예컨대, 복수의 메뉴, 복수의 메뉴 각각에 대응하는 복수의 제1 계층 서브 메뉴, 복수의 제1 계층 서브 메뉴 각각에 대응하는 복수의 제2 계층 서브 메뉴 등)을 저장할 수도 있다. The storage unit 150 may store a program for processing and control of the controller 110 and may store data to be input / output (e.g., a plurality of menus, a plurality of first hierarchical submenus corresponding to the plurality of menus, A plurality of second layer submenus corresponding to each of the plurality of first layer submenus, and the like).

저장부(150)는 음성 패킷 송수신 애플리케이션에 관한 메타 데이터를 미리 저장할 수 있다. 또한, 저장부(150)는 획득한 사용자 입력의 길이 및 유형, 입력된 음성 데이터에 관한 정보를 저장할 수 있다. 저장부(150)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory) SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 유저 디바이스(100)는 인터넷(internet)상에서 메모리(170)의 저장 기능을 수행하는 웹 스토리지(web storage) 또는 클라우드 서버를 운영할 수도 있다.The storage unit 150 may store metadata related to the voice packet transmission / reception application in advance. Also, the storage unit 150 may store the length and type of the acquired user input, and information on the inputted voice data. The storage unit 150 may be a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (for example, SD or XD memory) (Random Access Memory) SRAM (Static Random Access Memory), ROM (Read Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory) A disk, and / or an optical disk. Also, the user device 100 may operate a web storage or a cloud server that performs a storage function of the memory 170 on the Internet.

저장부(150)에 저장된 프로그램들은 그 기능에 따라 복수 개의 모듈들로 분류할 수 있는데, 예를 들어, UI 모듈, 터치 스크린 모듈, 알림 모듈 등으로 분류될 수 있다. Programs stored in the storage unit 150 can be classified into a plurality of modules according to their functions, for example, a UI module, a touch screen module, a notification module, and the like.

음향 출력부(160)는 통신부(120)로부터 수신되거나 메모리에 저장된 오디오 데이터를 출력한다. 또한, 음향 출력부(160)는 음성 패킷 애플리케이션에 포함된 효과음, 배경음과 관련된 음향 신호를 출력한다. 이러한 음향 출력부(160)에는 스피커(speaker), 버저(Buzzer) 등이 포함될 수 있다. The sound output unit 160 outputs audio data received from the communication unit 120 or stored in the memory. Also, the sound output unit 160 outputs sound signals related to the sound effect and the background sound included in the voice packet application. The sound output unit 160 may include a speaker, a buzzer, and the like.

또한, 음향 출력부(160)는 진동 모터(미도시)를 더 포함할 수 있다. 진동 모터는 진동 신호를 출력할 수 있다. 예를 들어, 진동 모터는 오디오 데이터 또는 이미지 데이터(예컨대, 만화 데이터에 포함된 효과음, 배경음)의 출력에 대응하는 진동 신호를 출력할 수 있다. 또한, 진동 모터는 터치스크린에 터치가 입력되는 경우 진동 신호를 출력할 수도 있다.The sound output unit 160 may further include a vibration motor (not shown). The vibration motor can output a vibration signal. For example, the vibration motor may output a vibration signal corresponding to an output of audio data or image data (for example, an effect sound and a background sound included in comic data). In addition, the vibration motor may output a vibration signal when a touch is input to the touch screen.

제어부(110)는 도 3에 도시된 바와 같이 발신 제어부(111) 및 수신 제어부(112)를 포함할 수 있다. 발신 제어부(111) 및 수신 제어부(112)에 대한 설명은 도 4 및 도 5에 대한 설명에서 상세히 설명하겠다. The control unit 110 may include an originating control unit 111 and a reception control unit 112 as shown in FIG. The transmission control unit 111 and the reception control unit 112 will be described in detail with reference to FIGS. 4 and 5. FIG.

도 4에 도시된 바와 같이, 발신 제어부(111)는 분절부(1111), 제1 처리부(1112), 제2 처리부(1113), 암호화부(1114), 통신 제어부(1115)를 포함할 수 있다. 4, the origination control unit 111 may include a segmentation unit 1111, a first processing unit 1112, a second processing unit 1113, an encryption unit 1114, and a communication control unit 1115 .

분절부(1111)는 입력부(140)를 통해 입력된 음성 데이터를 소정의 단위로 분절(segment)한다. 분절부(1111)가 음성 데이터를 소정의 단위로 분절하는 것은 실시간으로 음성 데이터를 송신하거나 수신하도록 하기 위함이다. 여기서, 소정의 단위는 기 설정된 시간을 기준으로 하는 단위 이거나, 기 설정된 데이터 양을 기준으로 하는 단위일 수 있다. 예를 들어, 분절부(1111)는 음성 데이터를 소정의 단위인 20ms 단위로 분절할 수 있다. 분절부(1111)를 통해 음성 데이터를 소정의 단위로 분절하는 것은 송신자 및 수신자 사이에 음성 데이터를 실시간으로 전송하고 수신하기 위함이다. 즉, 본 발명의 실시예에 따른 음성 패킷 송수신 장치(100, 200)와 같이 소정의 시간 단위로 분절된 음성 패킷을 송수신함으로써, 실시간으로 음성 통화를 하는 것과 같은 효과가 있게 된다. The segmentation unit 1111 segments the voice data input through the input unit 140 into a predetermined unit. The segmentation unit 1111 segments voice data into predetermined units in order to transmit or receive voice data in real time. Here, the predetermined unit may be a unit based on a preset time or a unit based on a predetermined amount of data. For example, the segmentation unit 1111 can segment voice data in units of 20ms, which is a predetermined unit. The segmentation of the voice data through the segmentation unit 1111 in a predetermined unit is for transmitting and receiving voice data in real time between the sender and the receiver. That is, by transmitting and receiving the voice packets segmented by a predetermined time unit like the voice packet transmission / reception apparatuses 100 and 200 according to the embodiment of the present invention, it is possible to effect a voice call in real time.

제1 처리부(1112)는 분절된 음성 데이터 중 중복 피치를 검출하여, 중복 피치를 제거함으로써, 음성 데이터의 길이를 감소시키고, 결과적으로 음성 데이터의 전송 대역폭 또는 전송 시간을 감소시키게 된다. 제1 처리부(1112)는 수신자로부터 수신한 피드백 정보, 예를 들어 전송 대역폭 또는 전송 시간에 대한 속성 정보를 고려하여, 중복 피치의 제거 비율을 결정할 수 있다. 이렇게 결정한 중복 피치의 제거 비율에 따라 제1 처리부(1112)는 음성 데이터에 포함된 중복 피치를 삭제하고, 중복 피치를 제거한 음성 데이터를 생성한다. 예를 들어, 음성 패킷의 전송 대역폭 또는 전송 시간에 대하여, 음성 패킷을 송수신하기에 전송 대역폭 또는 전송 시간이 부족하다는 피드백 정보를 수신하면, 이에 따라 제1 처리부(1112)는 더 많은 중복 피치를 제거하게 되고, 최소한의 피치 만을 남기게 된다. 반대로, 제1 처리부(1112)는 음성 패킷의 전송 대역폭 또는 전송 시간에 대하여, 음성 패킷을 송수신하기에 전송 대역폭 또는 전송 시간이 충분하다는 피드백 정보를 수신하면, 이에 따라 제1 처리부(1112)는 중복 피치를 제거하지 않을 수 있고, 실제 입력된 음성 데이터와 거의 흡사한 음성 패킷을 생성할 수 있다.The first processing unit 1112 detects the overlapping pitch among the segmented voice data and eliminates the overlapping pitch, thereby reducing the length of the voice data and consequently reducing the transmission bandwidth or transmission time of the voice data. The first processing unit 1112 can determine the rejection ratio of the overlapping pitch by considering the feedback information received from the receiver, for example, the attribute information on the transmission bandwidth or the transmission time. The first processing unit 1112 deletes the redundant pitch included in the voice data and generates the voice data from which the redundant pitch is removed according to the determined removal ratio of the redundant pitch. For example, when feedback information indicating that the transmission bandwidth or the transmission time is insufficient for transmitting or receiving a voice packet is received with respect to the transmission bandwidth or the transmission time of the voice packet, the first processing unit 1112 accordingly eliminates the redundant pitch , Leaving only a minimum pitch. On the contrary, when the first processing unit 1112 receives the feedback information indicating that the transmission bandwidth or the transmission time is sufficient to transmit and receive the voice packet with respect to the transmission bandwidth or the transmission time of the voice packet, It is possible not to remove the pitch and generate a voice packet almost similar to the actually inputted voice data.

특히, 제1 처리부(1112)는 분절된 음성 데이터에 포함된 제1 피치 및 제1 피치와 인접한 피치 단위인 제2 피치 사이의 유사도를 판단하고, 상기 유사도가 기 설정된 임계비율 이상인 경우, 상기 제2 피치를 제거한다. In particular, the first processing unit 1112 determines the similarity between the first pitch and the first pitch included in the segmented voice data and the second pitch that is adjacent to the pitch unit, and when the similarity is equal to or greater than a predetermined threshold ratio, 2 Remove the pitch.

또한, 제1 처리부(1112)는 분절된 음성 데이터에 포함된 제1 피치 및 제1 피치와 인접한 피치 단위인 제2 피치 사이의 차이값을 산출하고, 상기 차이값이 기 설정된 임계 차이 이하인 경우, 제1 피치와 제2 피치가 굉장히 유사한 음성 데이터라고 판단하고, 제2 피치를 제거한다. In addition, the first processing unit 1112 may calculate a difference value between a first pitch included in the segmented voice data and a second pitch that is a pitch unit adjacent to the first pitch, and when the difference is less than or equal to a preset threshold difference, It is determined that the first pitch and the second pitch are voice data that is very similar to each other, and the second pitch is removed.

또한, 제1 처리부(1112)는 분절된 음성 데이터에 포함된 제1 피치를 추출하고, 상기 제1 피치와 유사한 제2 피치를 상기 분절된 음성 데이터 안에서 검색하고, 상기 제1 피치 및 상기 제2 피치 사이의 유사도를 판단한다. 제1 처리부(1112)는 상기 유사도를 기초로, 제1 피치를 시작으로 하는 제1 음성 데이터 및 제2 피치를 시작으로 하는 제2 음성 데이터를 비교하고, 제1 음성 데이터 및 제2 음성 데이터 사이의 유사도를 판단하여, 상기 유사도가 기 설정된 임계 유사도 이상인 경우, 제2 음성 데이터를 제거할 수 있다. Also, the first processing unit 1112 extracts a first pitch included in the segmented voice data, searches a second pitch similar to the first pitch in the segmented voice data, and the first pitch and the second And determines the similarity between the pitches. The first processing unit 1112 compares the first audio data starting from the first pitch and the second audio data starting from the second pitch on the basis of the similarity, And if the similarity degree is equal to or greater than a predetermined threshold similarity degree, the second voice data can be removed.

다른 실시예에서, 제1 처리부(1112)는 분절된 음성 데이터에 포함된 제1 피치가 소정의 조건을 만족하는지 여부를 판단하고, 판단 결과, 상기 제1 피치가 소정의 조건을 만족하는 경우, 제1 피치 및 제1 피치와 인접한 제2 피치 사이의 유사도를 판단하고, 상기 유사도를 기초로, 제2 피치를 제거할 수 있다.예를 들어, 제1 처리부(1112)는 음성 데이터에 포함된 제1 피치가 묵음 데이터와 대응되거나, 모음 데이터와 대응되는 경우, 상기 제1 피치 및 상기 제1 피치와 인접한 제2 피치 사이의 유사도를 판단하고, 상기 유사도를 기초로 제2 피치를 제거할 수 있다. 즉, 본 발명의 실시예에 따른 음성 패킷 송수신 장치(100, 200)는 묵음 데이터 또는 모음 데이터인 인접한 피치들을 하나의 그룹으로 묶고, 하나의 그룹에 포함된 피치들 중 중복된 피치를 제거할 수 있게 된다. In another embodiment, the first processing unit 1112 determines whether or not the first pitch included in the segmented voice data satisfies a predetermined condition, and when the first pitch satisfies a predetermined condition as a result of the determination, The first processing unit 1112 may determine the similarity between the first pitch and the first pitch and the second pitch adjacent to the first pitch and remove the second pitch based on the similarity. Determining a degree of similarity between the first pitch and a second pitch adjacent to the first pitch when the first pitch corresponds to the silence data or corresponds to the vowel data and can eliminate the second pitch based on the degree of similarity have. That is, the speech packet transmission / reception apparatuses 100 and 200 according to the embodiment of the present invention can group adjacent pitches, which are silence data or vowel data, into one group and remove redundant pitch among the pitches included in one group .

다른 실시예에서, 제1 처리부(1112)는 분절된 음성 데이터에 포함된 제1 피치가 묵음(null) 또는 음성과 대응되는지 여부를 판단하고, 판단 결과, 묵음과 대응되는 제1 피치를 제거할 수 있다. 제1 처리부(1112)는 판단 결과, 음성과 대응되는 제1 피치에 대해서는 중복 피치를 판단하는 과정을 추가적으로 거치도록 할 수 있다.In another embodiment, the first processing unit 1112 determines whether or not the first pitch included in the segmented voice data corresponds to null or voice, and, as a result of the determination, removes the first pitch corresponding to the silence . As a result of the determination, the first processing unit 1112 may further determine a duplicate pitch for the first pitch corresponding to the voice.

제2 처리부(1113)는 네트워크 환경을 고려하여, 부호화 조건을 결정하고, 상기 부호화 조건에 따라 음성 데이터를 부호화(encoding)한다. 여기서, 부호화 조건은 음성 데이터를 디지털 변환하는데 필요한 파라미터를 말하며, 대역폭, 샘플링 주파수, 양자화 레벨, 비트율 중 하나를 포함할 수 있다. 여기서, 대역폭은 음성 패킷을 송수신하는데 이용되는 주파수의 범위를 말한다. 대역폭이 클수록 한번에 더 많은 음성 패킷을 전송할 수 있다. 여기서, 샘플링 주파수는 연속적인 파형을 일정 주기의 펄스 진폭으로 대표시키는 PAM(Pulse Amplitude Modulation) 조작을 말한다. 여기서, 양자화는 연속적인 진폭값을 유한한 수의 진폭값에 대응시키는 것으로, 연속적으로 변화하는 어떤 값을 불연속적인 대표값으로 나타내는 과정을 말하며, 양자화 비트수는 동일한 구간에서의 추출되는 대표값을 말한다. 양자화 비트수가 커질수록 원 음성 데이터와 유사한 데이터 변환을 할 수 있다. 여기서, 비트율은 데이터 전송 율을 말하는 것으로, 초당 전송되는 데이터의 양으로 정의된다. The second processing unit 1113 determines the encoding conditions in consideration of the network environment, and encodes the audio data according to the encoding conditions. Here, the encoding condition refers to a parameter necessary for digital conversion of speech data, and may include one of a bandwidth, a sampling frequency, a quantization level, and a bit rate. Here, the bandwidth refers to a range of frequencies used to transmit and receive voice packets. The larger the bandwidth, the more voice packets can be transmitted at one time. Here, the sampling frequency refers to a PAM (Pulse Amplitude Modulation) operation in which a continuous waveform is represented by a pulse amplitude of a predetermined period. Here, the quantization refers to a process in which a continuous amplitude value is associated with a finite number of amplitude values, and a certain value that changes continuously is represented as a discontinuous representative value. The quantization bit number is a representative value extracted in the same interval It says. As the number of quantization bits increases, data conversion similar to original voice data can be performed. Here, the bit rate refers to the data transmission rate, which is defined as the amount of data transmitted per second.

또한, 제2 처리부(1113)는 네트워크 상황을 고려하여, 대역폭을 결정하고, 상기 대역폭에 따라 음성 데이터를 부호화한다. 제2 처리부(1113)는 수신한 음성 패킷에 포함된 전송 대역폭에 대한 속성 정보를 고려하여, 음성 패킷의 전송 대역폭을 감소시키거나 증가시킬 수 있다. 예를 들어, 음성 패킷에 포함된 전송 대역폭에 대한 속성 정보가 대역폭 감소와 대응되는 정보를 포함하고 있는 경우, 즉, 음성 패킷의 전송 대역폭이 부족한 경우, 더 많은 음성 데이터를 송수신하기 위해서, 제2 처리부(1113)는 감소된 대역폭으로 음성 데이터를 부호화하게 된다. 또한, 음성 패킷에 포함된 전송 대역폭에 대한 속성 정보가 대역폭 증가와 대응되는 정보를 포함하고 있는 경우, 즉, 음성 패킷의 전송 대역폭이 충분한 경우, 음성 패킷의 데이터의 감소가 불필요하므로, 제2 처리부(1113)는 증가된 대역폭으로 음성 데이터를 부호화하게 된다. In addition, the second processing unit 1113 determines the bandwidth in consideration of the network situation, and encodes the voice data according to the bandwidth. The second processing unit 1113 may reduce or increase the transmission bandwidth of the voice packet in consideration of the attribute information on the transmission bandwidth included in the received voice packet. For example, when the attribute information on the transmission bandwidth included in the voice packet includes information corresponding to bandwidth reduction, that is, when the transmission bandwidth of the voice packet is insufficient, in order to transmit / receive more voice data, The processing unit 1113 encodes the voice data with a reduced bandwidth. In the case where the attribute information on the transmission bandwidth included in the voice packet includes information corresponding to the bandwidth increase, that is, when the transmission bandwidth of the voice packet is sufficient, the reduction of the voice packet data is unnecessary. (1113) encodes the voice data with an increased bandwidth.

또한, 제2 처리부(1113)는 부호화 조건을 포함하도록 헤더 정보를 생성하여, 부호화한 음성 데이터 및 상기 헤더 정보를 포함하는 음성 패킷을 생성한다. Also, the second processing unit 1113 generates header information to include the encoding condition, and generates encoded speech data and a speech packet including the header information.

암호화부(1114)는 음성 패킷을 암호화(encryption)한다. 여기서, 보안 키는 음성 통화가 연결되면서, 수신자 측 장치와 발신자 측 장치 사이가 공유한 보안 키로서, 수신자 또는 발신자의 전화 번호, 식별 정보, 아이디, IP 정보(Internet Protocol, 네트워크 정보)를 기초로 생성되거나, 음성 통화를 요청한 시간, 음성 통화에 할당된 식별 정보 등을 기초로 생성될 수 있다. 여기서, 음성 패킷을 암호화하는 방법은 해당 분야의 기술 수준 및 당업자의 기술 상식에 따라 다양한 방법으로 실행될 수 있다. The encryption unit 1114 encrypts the voice packet. Here, the security key is a security key that is shared between the receiver side device and the sender side device while a voice call is connected, and is a security key based on the phone number, identification information, ID, IP information (Internet Protocol The time at which the voice call was requested, the identification information assigned to the voice call, and the like. Here, the method of encrypting the voice packet can be performed in various ways according to the technical level of the field and the technical knowledge of the person skilled in the art.

통신 제어부(1115)는 암호화된 음성 패킷을 수신자 측 음성 패킷 송수신 장치(100, 200)으로 송신 한다. The communication control unit 1115 transmits the encrypted voice packet to the receiver-side voice packet transmission / reception apparatuses 100 and 200.

도 5에 도시된 바와 같이, 수신 제어부(112)는 수신자 측 음성 패킷 송수신 장치(200)에 포함된 구성요소로서, 수신 처리부(1122), 해독부(1121), 복호화부(1123), 확장부(1124)를 포함할 수 있다. 5, the reception control unit 112 includes the reception processing unit 1122, the decryption unit 1121, the decryption unit 1123, the extension unit 1123, Gt; 1124 < / RTI >

해독부(1121)는 통신부(120)를 통해 수신한 음성 패킷을 보안 키를 이용하여 해독한다. 여기서, 보안 키는 음성 통화가 연결되면서, 수신자 측 장치와 발신자 측 장치 사이가 공유한 보안 키로서, 수신자 또는 발신자의 전화 번호, 식별 정보, 아이디, IP 정보(Internet Protocol, 네트워크 정보)를 기초로 생성되거나, 음성 통화를 요청한 시간, 음성 통화에 할당된 식별 정보 등을 기초로 생성될 수 있다. The decryption unit 1121 decrypts the voice packet received through the communication unit 120 using the security key. Here, the security key is a security key that is shared between the receiver side device and the sender side device while a voice call is connected, and is a security key based on the phone number, identification information, ID, IP information (Internet Protocol The time at which the voice call was requested, the identification information assigned to the voice call, and the like.

수신 처리부(1122)는 해독된 음성 패킷을 분석하여, 음성 패킷의 전송 대역폭 또는 전송 시간에 대한 속성 정보를 결정한다. 특히, 수신 처리부(1122)는 해독된 음성 패킷의 길이와 대응되는 패킷 시간과 기준 시간을 비교함으로써, 음성 패킷의 전송 대역폭 또는 전송 시간에 대한 증감 여부를 결정한다. 여기서, 기준 시간은 음성 패킷 송수신시, 발신되는 음성 패킷의 시간 정보로서, 음성 데이터가 분절되는 소정의 단위와 대응되는 시간이 이에 해당한다. 예를 들어, 음성 패킷 송수신 장치(100, 200)은 1초 단위 또는 20 ms 단위의 음성 데이터를 음성 패킷으로 변환하여 송수신하게 된다. 여기서, 음성 데이터의 최소 단위인 1초 또는 20ms 가 기준 시간이 되게 된다. The reception processing unit 1122 analyzes the decoded voice packet to determine the attribute information on the transmission bandwidth or transmission time of the voice packet. In particular, the reception processing unit 1122 determines whether to increase or decrease the transmission bandwidth or transmission time of the voice packet by comparing the length of the decoded voice packet with the packet time corresponding to the reference time. Here, the reference time corresponds to a time corresponding to a predetermined unit in which voice data is segmented, as time information of a voice packet to be transmitted at the time of voice packet transmission / reception. For example, the voice packet transmission / reception apparatuses 100 and 200 convert voice data in units of 1 second or 20 ms into voice packets and transmit / receive them. Here, the minimum unit of 1 second or 20 ms is the reference time.

상기 패킷 시간이 기준 시간 미만인 경우, 수신 처리부(1122)는 음성 패킷 송수신을 위한 전송 대역폭 또는 전송 시간을 감소 시키도록 제어한다. 즉, 수신 처리부(1122)는 음성 패킷의 전송 대역폭이 더 감소 시킬 수 있는 범위에 있는 경우, 음성 패킷의 전송 대역폭을 감소 시키고, 그렇지 않은 경우, 음성 패킷의 전송 시간을 감소시키도록 제어한다. 또한, 상기 패킷 시간이 기준 시간을 초과하는 경우, 수신 처리부(1122)는 음성 패킷 송수신을 위한 전송 대역폭을 증가 시키도록 제어한다. 여기서, 수신 처리부(1122)는 상기 음성 패킷의 전송 대역폭 또는 전송 시간에 대한 정보를 포함하는 피드백 정보를 생성하여, 상기 피드백 정보를 전송할 음성 패킷에 포함시키도록 제어하거나, 음성 패킷의 전송에도 활용할 수 있다. 만약, 수신 처리부(1122)는 피드백 정보를 포함하는 음성 패킷을 수신한 경우라면, 포함된 피드백 정보을 고려하여, 수신한 음성 패킷을 음성 데이터로 변환하고, 피드백 정보를 고려하여, 입력한 음성 데이터를 음성 패킷로 변환한다. If the packet time is less than the reference time, the reception processing unit 1122 controls to reduce the transmission bandwidth or transmission time for voice packet transmission / reception. That is, the reception processing unit 1122 controls to reduce the transmission bandwidth of the voice packet when the transmission bandwidth of the voice packet is further reduced, and to reduce the transmission time of the voice packet if not. If the packet time exceeds the reference time, the reception processing unit 1122 controls to increase the transmission bandwidth for voice packet transmission / reception. Here, the reception processing unit 1122 generates feedback information including information on the transmission bandwidth or the transmission time of the voice packet, and controls the reception unit 1122 to include the feedback information in the voice packet to be transmitted, have. If the reception processing unit 1122 receives the voice packet including the feedback information, the reception processing unit 1122 converts the received voice packet into voice data in consideration of the included feedback information, And converts it into a voice packet.

복호화부(1123)는 음성 패킷을 복호화(decoding)하여, 음성 데이터를 생성한다. 확장부(1124)는 복호화한 음성 데이터의 길이와 대응되는 시간과 기준 시간을 비교하여, 음성 데이터를 확장한다. 예를 들어, 확장부(1124)는 음성 데이터의 길이와 대응되는 시간이 기준 시간의 70퍼센트 미만인 경우, 음성 데이터를 피치 단위로 분할하고, 각 피치의 성격을 분석하고, 모음에 해당하는 피치의 길이를 확장 또는 증가함으로써, 음성 데이터를 전체적으로 확장한다. 이를 통해, 본 발명의 실시예에 따른 음성 패킷 송수신 장치(100, 200)는 발신자의 음성 데이터와 유사한 음성 데이터를 복원하게 된다. The decryption unit 1123 decrypts the voice packet to generate voice data. The extension unit 1124 compares the time corresponding to the length of the decoded voice data with the reference time, and expands the voice data. For example, if the length of the voice data is less than 70 percent of the reference time, the extension unit 1124 divides the voice data into pitch units, analyzes the characteristics of each pitch, By extending or increasing the length, the voice data is expanded as a whole. Thus, the voice packet transmission / reception apparatuses 100 and 200 according to the embodiment of the present invention restore the voice data similar to the voice data of the caller.

도 6은 본 발명의 제1 실시예에 따른 음성 패킷 송수신 방법을 나타내는 흐름도이다. 6 is a flowchart illustrating a method of transmitting and receiving a voice packet according to the first embodiment of the present invention.

S101에서는 음성 패킷 송수신 장치(100, 200)는 입력부(140)를 통해 음성 데이터를 입력 받는다. In S101, the voice packet transmission / reception apparatuses 100 and 200 receive voice data through the input unit 140. [

S102에서는 음성 패킷 송수신 장치(100, 200)는 입력된 음성 데이터를 소정의 단위로 분절(segment)한다.In S102, the voice packet transmission / reception devices 100 and 200 segment the input voice data into predetermined units.

S103에서는 음성 패킷 송수신 장치(100, 200)는 분절된 음성 데이터 중 중복 피치를 검출하여, 중복 피치를 제거함으로써, 음성 데이터의 길이를 감소 시키고, 결과적으로 음성 데이터의 전송 대역폭 또는 전송 시간을 감소시키게 된다. 중복 피치를 제거하기 위해서, 음성 패킷 송수신 장치(100, 200)는 분절된 음성 데이터에 포함된 제1 피치 및 제1 피치와 인접한 피치 단위인 제2 피치 사이의 유사도를 판단하고,In S103, the speech packet transmission / reception apparatuses 100 and 200 detect a redundant pitch in the segmented speech data and remove the redundant pitch, thereby reducing the length of the speech data and consequently reducing the transmission bandwidth or transmission time of the speech data do. In order to eliminate the redundant pitches, the speech packet transmission / reception apparatuses 100 and 200 determine the similarity between the first pitch included in the segmented speech data and the first pitch and the second pitch, which is adjacent to the pitch,

S104에서는 음성 패킷 송수신 장치(100, 200)는 분절된 음성 데이터에 포함된 제1 피치 및 제1 피치와 인접한 피치 단위인 제2 피치 사이의 유사도를 판단하고, 상기 유사도 및 기 설정된 임계 유사도 사이를 비교한다. In S104, the speech packet transmission / reception apparatuses 100 and 200 determine the similarity between the first pitch and the first pitch included in the segmented speech data and the second pitch, which is a pitch unit adjacent to the first pitch, and between the similarity and predetermined threshold similarity Compare.

S105에서는 음성 패킷 송수신 장치(100, 200)는 비교 결과, 상기 유사도가 기 설정된 임계 유사도를 초과하는 경우, 유사 음성 데이터를 제거한다. 좀더 구체적으로는 음성 패킷 송수신 장치(100, 200)는 유사한 중복 피치를 제거한다. In S105, the voice packet transmission / reception apparatuses 100 and 200 remove similar voice data when the similarity exceeds a predetermined threshold similarity as a result of the comparison. More specifically, the voice packet transmission / reception apparatuses 100 and 200 eliminate a similar overlapping pitch.

S106에서는 음성 패킷 송수신 장치(100, 200)는 네트워크 환경을 고려하여, 부호화 조건을 결정하고, 상기 부호화 조건에 따라 음성 데이터를 부호화(encoding)한다.In S106, the speech packet transmission / reception apparatuses 100 and 200 determine encoding conditions in consideration of the network environment, and encode speech data according to the encoding conditions.

S107에서는 음성 패킷 송수신 장치(100, 200)는 부호화 조건을 포함하도록 헤더 정보를 생성하여, 부호화한 음성 데이터 및 상기 헤더 정보를 포함하는 음성 패킷을 생성한다. 음성 패킷 송수신 장치(100, 200)는 전송되는 패킷의 시간 정보, 패킷 넘버, ip 정보를 헤더 정보로 추가하여 음성 패킷을 생성할 수 있다. In S107, the speech packet transmission / reception apparatuses 100 and 200 generate header information to include coding conditions, and generate encoded speech data and speech packets containing the header information. The voice packet transmission / reception apparatuses 100 and 200 can generate voice packets by adding time information, packet numbers, and ip information of a packet to be transmitted, as header information.

도 7은 본 발명의 제2 실시예에 따른 음성 패킷 송수신 방법을 나타내는 흐름도이다. 7 is a flowchart illustrating a method of transmitting and receiving a voice packet according to a second embodiment of the present invention.

S111에서는 음성 패킷 송수신 장치(100, 200)는 입력부(140)를 통해 음성 데이터를 입력 받는다. In S111, the voice packet transmission / reception apparatuses 100 and 200 receive voice data through the input unit 140. [

S112에서는 음성 패킷 송수신 장치(100, 200)는 입력된 음성 데이터를 소정의 단위로 분절(segment)한다.In S112, the voice packet transmission / reception apparatuses 100 and 200 segment the input voice data into predetermined units.

S113에서는 음성 패킷 송수신 장치(100, 200)는 분절된 음성 데이터가 음성 또는 묵음 중 하나와 대응되는지 여부를 판단한다. 판단 결과, 음성 데이터가 음성과 대응되는 경우, 음성 패킷 송수신 장치(100, 200)는 분절된 음성 데이터 중 제1 피치 및 상기 제1 피치와 인접된 제2 피치 사이의 유사도를 판단한다. (S114) 상기 판단 결과, 음성 데이터가 묵음과 대응되는 경우, 음성 패킷 송수신 장치(100, 200)는 상기 음성 데이터을 제거할 수 있다. In S113, the voice packet transmission / reception apparatuses 100 and 200 determine whether the segmented voice data corresponds to one of voice and mute. As a result of the determination, when the voice data corresponds to the voice, the voice packet transmission / reception apparatuses 100 and 200 determine the first pitch of the segmented voice data and the similarity between the first pitch and the adjacent second pitch. (S114) As a result of the determination, if the voice data corresponds to silence, the voice packet transmission / reception apparatuses 100 and 200 can remove the voice data.

S115에서는 음성 패킷 송수신 장치(100, 200)는 상기 유사도가 임계 유사도를 초과하는지 여부를 판단한다. S116에서는 음성 패킷 송수신 장치(100, 200)는 비교 결과, 상기 유사도가 기 설정된 임계 유사도를 초과하는 경우, 유사 음성 데이터를 제거한다. 좀더 구체적으로는 음성 패킷 송수신 장치(100, 200)는 유사한 중복 피치를 제거한다. In S115, the voice packet transmission / reception apparatuses 100 and 200 determine whether the similarity exceeds the threshold similarity. In S116, the speech packet transmission / reception apparatuses 100 and 200 remove similar speech data if the similarity exceeds a preset threshold similarity. More specifically, the voice packet transmission / reception apparatuses 100 and 200 eliminate a similar overlapping pitch.

S117에서는 음성 패킷 송수신 장치(100, 200)는 네트워크 환경을 고려하여, 부호화 조건을 결정하고, 상기 부호화 조건에 따라 음성 데이터를 부호화(encoding)한다.In step S117, the voice packet transmission / reception apparatuses 100 and 200 determine encoding conditions in consideration of the network environment, and encode the voice data according to the encoding conditions.

S118에서는 음성 패킷 송수신 장치(100, 200)는 부호화 조건을 포함하도록 헤더 정보를 생성하여, 부호화한 음성 데이터 및 상기 헤더 정보를 포함하는 음성 패킷을 생성한다. 음성 패킷 송수신 장치(100, 200)는 전송되는 패킷의 시간 정보, 패킷 넘버, ip 정보를 헤더 정보로 추가하여 음성 패킷을 생성할 수 있다. In S118, the voice packet transmission / reception devices (100, 200) generate header information including the coding conditions, and generate the voice data including the encoded voice data and the header information. The voice packet transmission / reception apparatuses 100 and 200 can generate voice packets by adding time information, packet numbers, and ip information of a packet to be transmitted, as header information.

도 8은 본 발명의 제3 실시예에 따른 음성 패킷 송수신 방법을 나타내는 흐름도이다. FIG. 8 is a flowchart illustrating a method of transmitting and receiving a voice packet according to a third embodiment of the present invention.

S201에서는 음성 패킷 송수신 장치(100, 200)는 통신부(120)를 통해 음성 패킷을 수신하도록 제어한다.In S201, the voice packet transmission / reception devices (100, 200) control to receive voice packets through the communication unit (120).

S202에서는 음성 패킷 송수신 장치(100, 200)는 수신한 음성 패킷을 통화 연결시 수신한 보안 키를 이용하여 해독한다.S203에서는 음성 패킷 송수신 장치(100, 200)는 음성 패킷을 분석하여, 음성 패킷의 전송 대역폭 또는 전송 시간에 대한 속성 정보를 결정한다. 음성 패킷 송수신 장치(100, 200)는 수신된 음성 패킷의 길이와 대응되는 패킷 시간과 기준 시간을 비교함으로써, 음성 패킷의 전송 대역폭 또는 전송 시간에 대한 증감 여부를 결정한다.In S202, the voice packet transmission / reception devices (100, 200) decode the received voice packets using the security keys received upon call connection. In S203, the voice packet transmission / reception devices (100, 200) analyze the voice packets, The transmission bandwidth or the transmission time. The voice packet transmission / reception apparatuses 100 and 200 determine whether to increase or decrease the transmission bandwidth or transmission time of the voice packet by comparing the packet time and the reference time corresponding to the length of the received voice packet.

S204에서는 음성 패킷 송수신 장치(100, 200)는 해독한 음성 패킷을 패킷의 시간 정보, 패킷 넘버, ip 정보를 이용하여, 음성 패킷을 해체 시킨다. 즉, 음성 패킷 송수신 장치(100, 200)는 상기 패킷 정보를 이용하여, 하나 이상의 음성 패킷을 하나로 묶거나, 음성 패킷을 분해한다. S205에서는 음성 패킷 송수신 장치(100, 200)는 음성 패킷을 복호화(decoding)하여, 음성 데이터를 생성한다. S206에서는 음성 패킷 송수신 장치(100, 200)는 복호화한 음성 데이터의 길이와 대응되는 시간과 기준 시간을 비교하여, 음성 데이터를 확장한다.In S204, the voice packet transmission / reception apparatuses 100 and 200 decode the voice packet by using the time information, the packet number, and the ip information of the decoded voice packet. That is, the voice packet transmission / reception apparatuses 100 and 200 use one or more of the packet information to bundle one or more voice packets or decompose voice packets. In S205, the voice packet transmission / reception devices 100 and 200 decode voice packets to generate voice data. In S206, the speech packet transmission / reception apparatuses 100 and 200 compare the time corresponding to the length of the decoded speech data with the reference time to expand the speech data.

도 9는 발신자 측 음성 패킷 송수신 장치 및 수신자 측 음성 패킷 송수신 장치 사이의 데이터 송수신을 설명하기 위한 흐름도이다. 9 is a flowchart for explaining data transmission / reception between the sender-side voice packet transmission / reception device and the receiver-side voice packet transmission / reception device.

도 9에서는 수신자 측 음성 패킷 송수신 장치(100, 200) 및 발신자 측 음성 패킷 송수신 장치(100, 200)을 직접 연결하고 있지만, 수신자 측 음성 패킷 송수신 장치(100, 200) 및 발신자 측 음성 패킷 송수신 장치(100, 200)는 하나 이상의 중계 서버(경유 서버, 미도시)를 통해 연결될 수도 있다. Although the receiver side voice packet transmission / reception apparatuses 100 and 200 and the sender side voice packet transmission / reception apparatuses 100 and 200 are directly connected in FIG. 9, the receiver side voice packet transmission / reception apparatuses 100 and 200 and the caller side voice packet transmission / (100, 200) may be connected through one or more relay servers (diesel servers, not shown).

발신자는 발신자 음성 패킷 송수신 장치(100, 200)를 통해, 음성 통화 요청을 입력 받는다. 음성 통화 요청에 포함된 수신자 정보에 따라 수신자를 확인하고, 수신자와의 음성 통화와 대응되는 보안 키를 생성한다. 여기서, 보안 키는 음성 패킷의 암호화 또는 복호화를 위해 사용되는 키 값으로, 발신자 또는 수신자의 정보, 식별 정보, 전화 번호, 아이디 등을 기초로 생성되거나, 랜덤으로 생성될 수 있다. The sender receives a voice call request through the sender voice packet transmission / reception device 100, 200. Confirms the receiver according to the receiver information included in the voice call request, and generates a security key corresponding to the voice call with the receiver. Here, the secret key is a key value used for encrypting or decrypting a voice packet, and may be generated based on information of a sender or a receiver, identification information, a telephone number, an ID, or the like, or may be randomly generated.

발신자 측 음성 패킷 송수신 장치(100, 200)는 수신자와 대응되는 수신자 측 음성 패킷 송수신 장치(100, 200)으로 음성 통화 요청을 전송한다(S320). 상기 음성 통화 요청의 응답으로, 발신자 측 음성 패킷 송수신 장치(100, 200)는 수신자 측 음성 패킷 송수신 장치(100, 200)으로부터 음성 통화를 수락하는 신호를 수신 받는다(S325). The sender-side voice packet transmission / reception apparatuses 100 and 200 transmit a voice call request to the receiver-side voice packet transmission / reception apparatuses 100 and 200 corresponding to the receiver (S320). In response to the voice call request, the sender-side voice packet transmission / reception apparatuses 100 and 200 receive a voice call acceptance signal from the receiver-side voice packet transmission / reception apparatuses 100 and 200 (S325).

음성 통화를 수락하는 신호가 생성됨에 따라 발신자 측 음성 패킷 송수신 장치(100, 200) 및 수신자 측 음성 패킷 송수신 장치(100, 200) 사이의 음성 통화가 설립된다. 발신자 측 음성 패킷 송수신 장치(100, 200)는 발신자로부터 제1 음성 데이터를 입력 받고(S330), 상기 제1 음성 데이터를 분절, 압축, 부호화, 암호화 등의 과정을 거쳐, 제1 음성 패킷으로 변환한다(S335). 여기서, 분절, 압축, 부호화, 암호화 등의 과정은 도 4에 대한 설명과 동일하므로, 상세한 설명을 생략한다. 발신자 측 음성 패킷 송수신 장치(100, 200)는 제1 음성 패킷을 수신자 측 음성 패킷 송수신 장치(100, 200)로 전송한다(S340). 수신자 측 음성 패킷 송수신 장치(100, 200)는 제1 음성 패킷의 길이 및 기준 시간을 비교함으로써, 피드백 정보를 생성한다(S345). 여기서, 피드백 정보는 제1 음성 패킷의 길이와 대응되는 시간이 기준 시간을 넘지 않는 경우, 즉, 제1 음성 패킷의 길이와 대응되는 시간이 기준 시간의 미만인 경우, 음성 패킷의 전송 대역폭 및 전송 시간을 감소시켜야 하는 정보 일 수 있고, 제1 음성 패킷의 길이와 대응되는 시간이 기준 시간을 초과하는 경우, 전송 대역폭 및 전송 시간을 증가시켜야 하는 정보 일 수 있다. 피드백 정보는 미리 규정된 규칙에 따른 각각의 정보와 대응되는 인덱스, 코드 등의 정보들로 구성될 수 있다. 수신자 측 음성 패킷 송수신 장치(100, 200)는 수신한 제1 음성 패킷을 제1 음성 데이터로 변환하고, 상기 제1 음성 데이터를 음향 출력부(160)를 통해, 출력한다(미도시). As a signal for accepting a voice call is generated, a voice call is established between the caller-side voice packet transmission / reception apparatuses 100, 200 and the receiver-side voice packet transmission / reception apparatuses 100, 200. The sender-side voice packet transmission / reception apparatuses 100 and 200 receive the first voice data from the caller (S330) and convert the first voice data into a first voice packet through segmentation, compression, encoding, (S335). Here, the processes of segmentation, compression, encoding, encryption, and the like are the same as those of FIG. 4, and therefore, detailed description thereof will be omitted. The sender-side voice packet transmission / reception devices 100 and 200 transmit the first voice packets to the receiver-side voice packet transmission / reception devices 100 and 200 (S340). The receiver-side voice packet transmission / reception apparatuses 100 and 200 generate feedback information by comparing the length of the first voice packet and the reference time (S345). Here, if the time corresponding to the length of the first speech packet does not exceed the reference time, that is, if the time corresponding to the length of the first speech packet is less than the reference time, the feedback information indicates the transmission bandwidth of the speech packet and the transmission time And may be information for increasing the transmission bandwidth and the transmission time when the time corresponding to the length of the first voice packet exceeds the reference time. The feedback information may be composed of information such as an index and a code corresponding to each information according to a predetermined rule. The receiver-side voice packet transmission / reception apparatuses 100 and 200 convert the received first voice packet into first voice data and output the first voice data through the acoustical output unit 160 (not shown).

또한, 수신자 측 음성 패킷 송수신 장치(100, 200)는 제2 음성 데이터를 입력 받고(S350), 생성한 피드백 정보를 고려하여, 제2 음성 데이터를 분절, 압축, 부호화, 암호화하여 제2 음성 패킷을 생성한다(S355). 즉, 본 발명의 실시예에 따른 음성 패킷 송수신 장치(100, 200)는 음성 패킷의 전송 대역폭 또는 전송 시간에 대하여 조정(증가 또는 감소)가 필요한 경우, 이에 따라 음성 패킷의 전송 대역폭 또는 전송 시간을 조정할 수 있도록 제2 음성 데이터를 압축, 부호화하게 된다. 여기서, 압축은 중복 피치를 제거하는 과정을 말하는 것이다. 수신자 측 음성 패킷 송수신 장치(100, 200)는 S340와 동일하게 제2 음성 패킷을 발신자 측 음성 패킷 송수신 장치(100, 200)로 전송하게 된다(S360). 이때, 수신자 측 음성 패킷 송수신 장치(100, 200)는 S345에서 생성한 피드백 정보를 포함하도록 제2 음성 패킷을 생성한다. 발신자 측 음성 패킷 송수신 장치(100, 200)는 수신한 피드백 정보를 고려하여, 제2 음성 패킷을 복호화하여 제2 음성 데이터로 변환한다(S365). The receiver-side voice packet transmission / reception apparatuses 100 and 200 receive the second voice data (S350), and segment, compress, encode and encrypt the second voice data in consideration of the generated feedback information, (S355). That is, when the voice packet transmission / reception apparatuses 100 and 200 according to the embodiment of the present invention need to adjust (increase or decrease) the transmission bandwidth or the transmission time of the voice packet, the transmission bandwidth or transmission time The second audio data is compressed and encoded so as to be adjusted. Here, compression refers to a process of eliminating redundant pitches. The receiver-side voice packet transmission / reception apparatuses 100 and 200 transmit the second voice packets to the sender-side voice packet transmission / reception apparatuses 100 and 200 as in S340 (S360). At this time, the receiver-side voice packet transmitting / receiving apparatuses 100 and 200 generate the second voice packet so as to include the feedback information generated in step S345. The sender-side voice packet transmission / reception apparatuses 100 and 200 decode the second voice packet into the second voice data in consideration of the received feedback information (S365).

이상 설명된 본 발명에 따른 실시예는 컴퓨터 상에서 다양한 구성요소를 통하여 실행될 수 있는 컴퓨터 프로그램의 형태로 구현될 수 있으며, 이와 같은 컴퓨터 프로그램은 컴퓨터로 판독 가능한 매체에 기록될 수 있다. 이때, 매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다. 나아가, 매체는 네트워크 상에서 전송 가능한 형태로 구현되는 무형의 매체를 포함할 수 있으며, 예를 들어 소프트웨어 또는 애플리케이션 형태로 구현되어 네트워크를 통해 전송 및 유통이 가능한 형태의 매체일 수도 있다. The embodiments of the present invention described above can be embodied in the form of a computer program that can be executed on various components on a computer, and the computer program can be recorded on a computer-readable medium. At this time, the medium may be a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floptical disk, , A RAM, a flash memory, and the like, which are specifically configured to store and execute program instructions. Further, the medium may include an intangible medium that is implemented in a form that can be transmitted over a network, and may be, for example, a medium in the form of software or an application that can be transmitted and distributed through a network.

한편, 상기 컴퓨터 프로그램은 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 프로그램의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함될 수 있다.Meanwhile, the computer program may be designed and configured specifically for the present invention or may be known and used by those skilled in the computer software field. Examples of computer programs may include machine language code such as those produced by a compiler, as well as high-level language code that may be executed by a computer using an interpreter or the like.

본 발명에서 설명하는 특정 실행들은 일 실시 예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, “필수적인”, “중요하게” 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.The specific acts described in the present invention are, by way of example, not intended to limit the scope of the invention in any way. For brevity of description, descriptions of conventional electronic configurations, control systems, software, and other functional aspects of such systems may be omitted. Also, the connections or connecting members of the lines between the components shown in the figures are illustrative of functional connections and / or physical or circuit connections, which may be replaced or additionally provided by a variety of functional connections, physical Connection, or circuit connections. Also, unless explicitly mentioned, such as " essential ", " importantly ", etc., it may not be a necessary component for application of the present invention.

본 발명의 명세서(특히 특허청구범위에서)에서 “상기”의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 본 발명에서 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 적용한 발명을 포함하는 것으로서(이에 반하는 기재가 없다면), 발명의 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. 마지막으로, 본 발명에 따른 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 따라 본 발명이 한정되는 것은 아니다. 본 발명에서 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 본 발명을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 본 발명의 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.
The use of the terms " above " and similar indication words in the specification of the present invention (particularly in the claims) may refer to both singular and plural. In addition, in the present invention, when a range is described, it includes the invention to which the individual values belonging to the above range are applied (unless there is contradiction thereto), and each individual value constituting the above range is described in the detailed description of the invention The same. Finally, the steps may be performed in any suitable order, unless explicitly stated or contrary to the description of the steps constituting the method according to the invention. The present invention is not necessarily limited to the order of description of the above steps. The use of all examples or exemplary language (e.g., etc.) in this invention is for the purpose of describing the present invention only in detail and is not to be limited by the scope of the claims, It is not. It will also be appreciated by those skilled in the art that various modifications, combinations, and alterations may be made depending on design criteria and factors within the scope of the appended claims or equivalents thereof.

10: 음성 패킷 송수신 시스템
100, 200: 음성 패킷 송수신 장치
300: 통신망10: Voice packet transmission / reception system
100, 200: voice packet transmission / reception device
300: Network

Claims

An input unit for inputting first voice data from a user;
A segmentation unit for segmenting the first speech data into predetermined units;
Wherein the first audio data is segmented in units of pitches, the overlapping pitches similar to the adjacent pitches included in the first audio data are detected, and the overlapping pitches are removed, thereby reducing the length of the first audio data A processor;
A second processing unit for generating a first speech packet by encoding the first speech data from which the overlapping pitch is removed;
A third processor for encrypting the first speech packet using a secret key;
And a communication unit for transmitting the encrypted first voice packet to a receiver-side voice packet transmission / reception apparatus.

The method according to claim 1,
The communication unit
Receiving a second voice packet from the receiver-side voice packet transmitting / receiving device,
The voice packet transmission /
And a reception controller for determining whether to increase or decrease the bandwidth or transmission time of the voice packet by comparing the packet time corresponding to the length of the second voice packet and a predetermined reference time.

3. The method of claim 2,
The reception control unit
Determine to decrease the bandwidth or transmission time of the voice packet if the packet time is less than the reference time,
And determines to increase the bandwidth or transmission time of the voice packet when the packet time is longer than the reference time.

The method of claim 3,
The second processing unit
And increases or decreases a transmission bandwidth according to a bandwidth or a transmission time of the voice packet, and encodes the voice data according to the transmission bandwidth.

The method of claim 3,
The first processing unit
And a voice packet transmission / reception unit that determines a removal ratio of the redundant pitch included in the voice data according to the increase or decrease in the bandwidth or the transmission time of the voice packet and removes the redundant pitch included in the voice data according to the removal ratio. Device.

The method according to claim 1,
The first processing unit
Determining a similarity between a first pitch included in the speech data and a second pitch that is a pitch unit adjacent to the first pitch, determining the second pitch as a redundant pitch when the similarity is equal to or greater than a predetermined threshold ratio, And removes the overlapping pitch.

The method according to claim 1,
The first processing unit
When the first pitch included in the voice data corresponds to a vowel, judges a similarity between the first pitch and a second pitch that is a previous pitch unit of the first pitch, and when the similarity is equal to or greater than a predetermined threshold ratio, And determines the second pitch to be an overlapping pitch, and removes the overlapping pitch.

Inputting first voice data from a user;
The control unit separating the first audio data into predetermined units;
The controller divides the first audio data segmented by the control unit into pitches, detects overlapping pitches similar to the adjacent pitches included in the first audio data, and removes the overlapping pitches to reduce the length of the first audio data ;
Generating a first speech packet by encoding the first speech data from which the overlapping pitch is removed;
Encrypting the first voice packet using the secret key;
And the communication unit transmits the encrypted first voice packet to the receiver-side voice packet transmission / reception apparatus.

A computer program stored in a computer-readable storage medium for executing the method of claim 8 using a computer.